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ABSTRACT 

This  thesis  summarizes  the  theory  of  number  theoretic 
transforms  (NTT's),  and  presents  original  examples  to 
illustrate  the  theory.  Concepts  have  been  studied  and 
compared  in  order  to  present  them  in  a cohesive  ani  unified 
manner. 

Software  and  hardware  implementation  of  Fermat  number 
transforms  are  discussed  and  compared  with  the  Fourier 
Transform  showing  a substantial  improvement  in  efficiency 
and  accuracy.  The  main  drawback  of  Fermat  Number  Transforms 
is  a rigid  relationship  between  the  allowed  sequence  length 
and  word  length.  Methods  and  other  NTT's,  for  overcoming 
this  problem  are  discussed.  The  theory  has  also  been 
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I . 


INTRODUCTION 


With  the  rapid  advances  in  large  scale  integration,  a 
growing  number  of  complex  digital  signal  processing  appli- 
cations are  becoming  economically  feasible.  In  most  cases 
the  bulk  of  processing  workload  appears  to  consist  of 
digital  filter  computation.  Future  progress  in  digital 
signal  processing,  either  towards  high  speed,  real  time 
operation  or  increased  sophistication,  thus  largely  depends 
on  increased  efficiency  in  digital  filtering  computation. 
This  can  be  achieved  not  only  by  implementing  improved 
filter  circuits  but  also  by  using  better  computational 
algorithms  as  will  be  discussed  in  this  thesis. 

Schonhaje  and  Strussen  [1]  (also  see  text  by  Knuth 

[2, p.269])  defined  Fourier-like  transforms  over  the  ring 

2n 

of  integers,  modulo  the  Fermat  numbers  [2]  2 + 1,  to 

yield  convolutions.  They  showed  that  such  convolutions 
can  be  used  to  perform  fast  integer  multiplications.  Rader 
[3]  and  Agarwal  and  Burrus  [4]  also  defined  Fourier-like 
transforms  over  residue  classes  of  integers,  modulo  the 
Fermat  and  Mersenne  primes,  to  compute  convolutions  of  the 
real  integer  sequences. 

For  this  presentation  and  exposition  of  Number  Theoretic 
Transforms  (NTT) , the  works  of  many  different  authors  were 
consulted.  Similar  concepts  have  been  studied  and  compared 
in  order  to  present  them  in  a cohesive  and  unified  manner. 
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In  section  II  a mathematical  framework  for  these  new 
transforms  is  presented.  This  framework  is  based  on  the 
theory  of  congruences,  modulo  M,  which  belongs  to  the  general 
area  of  what  is  often  called  "number  theory."  Number  theory 
is  very  old,  going  back  several  thousand  years;  at  least  as 
far  back  as  Euclid,  who  proved  some  of  the  original  results. 
The  names  of  many  other  famous  mathematicians  are  also  asso- 
ciated with  the  theory,  including  Joseph  Lagrange  (1763- 
1813),  and  Leonard  Euler  (1707-1783),  and  Carl  Gauss  (1777- 
1855)  who  is  responsible  for  many  contributions  to  the  area, 
some  of  which  were  published  in  his  book  Disquisitiones 
Arithmeticae,  published  in  1801,  when  he  was  24  [5].  An 
excellent  history  of  the  theory  of  numbers  is  found  in  the 
work  of  Dickson  [6] . 

In  recent  years,  there  has  been  increasing  interest  in 
the  practical  applications  of  various  parts  of  number  theory, 
including  the  theory  of  residue  number  systems.  There  has 
been  some  work  on  the  use  of  these  number  systems  in  general- 
purpose  computers  [7]  — [12] , although  this  line  of  investiga- 
tion has  not  yielded  many  practical  results  due  to  the 
difficulty  of  determining  the  sign  of  numbers  expressed  in 
residue  number  system  notation.  More  promising  results  have 
been  obtained  in  applications  where  sign  detection  is  not 
required,  such  as  number  theoretic  transforms  [3],  [4] 
and  [13] . 

The  possibility  of  performing  the  fast  Fourier  transform 
(FFT)  in  finite  algebraic  systems  [14],  [15]  and  [1],  is 
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being  increasingly  discussed  as  a means  of  digital  filtering 
[3],  [13]  and  [16]  (see  also  references  in  Reference  [13]). 

The  convolution  property  which  such  transforms  share  with 
the  conventional  f.f.t.  can  be  employed  to  construct  a non- 
recursive digital  filter  in  which  rounding  error  does  not 
occur. 

The  following  type  of  transforms  have  been  discussed  in 
the  literature. 

(a)  Transforms  in  arithmetic  mod  p,  p prime,  in  which 
the  order  (number  of  points) , d,  for  example,  is 
a factor  of  p-1. 

(b)  Transforms  in  arithmetic  mod  m,  m arbitrary,  in 
which  d dividies  p-1  for  each  prime  factor  p of 
m. 

(c)  Transforms  in  an  arbitrary  finite  Galois  field 
GF(pn)  of  pn  elements,  where  d divides  pn-l. 

Here  class  (a)  is  treated  as  a special  case  of  both  (b)  and 
(c) , which  are  essentially  different.  This  material  is 
presented  in  section  III  which  discusses  the  Fourier  trans- 
form in  a finite  field  in  a broad  way. 

The  best  known  number-theoretic  transform  is  the  Fermat- 
number  transform  [1],  [3],  [4],  [13],  [16].  Due  to  the 
simplicity  of  its  arithmetic,  such  a transform  is  the  fastest 
method  known  so  far  for  computing  integer  convolutions 
under  certain  conditions.  However,  this  transform  suffers 
from  the  disadvantage  that  the  restriction  imposed  :>n  the 
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register  word  length  is  often  too  excessive  [13] . In 
order  to  remedy  this  problem,  Rader  [3]  has  suggested  using 
a two-dimensional  convolution  scheme  to  convolve  long  one- 
dimensional sequences,  and  Agarwal  and  Burrus  [17],  [18]  have 

presented  such  a two-dimensional  convolution  scheme.  Other 
authors  [19],  [20],  [21],  [22],  [23]  have  also  considered 
other  number-theoretic  transforms  (NTT) . Section  IV  dis- 
cusses the  various  types  of  NTT. 

These  transforms  provide  more  choices  of  word  length 
and  transform  length,  at  speeds  which  are  not  attainable  by 
the  Fermat- number  transforms  under  similar  conditions. 
Problems  involved  in  the  implementation  of  Fermat-number 
transforms  are  discussed  in  section  V. 

Section  VI  deals  with  the  comparison  between  the  FFT  and 
the  FNT . Software  and  hardware  requirements  for  both  are 
analyzed  and  compared.  Agarwal  [13]  programmed  Fermat 
number  transforms  on  the  IBM  370/155  computer  [13]  and  showed 
how  to  compute  convolutions  approximately  three  times  as 
fast  as  the  FFT  implementation  for  the  same  convolution. 
However,  their  main  drawback  is  a rigid  relationship  between 
word  length  and  obtainable  transform  length,  as  well  as  a 
limited  choice  of  possible  word  lengths.  This  last  point 
is  especially  significant  for  FNT's,  and  may  result  in  a 
waste  of  computing  power  when  the  permissible  word  lengths 
do  not  correspond  to  the  dynamic  range  required  for  the 
convolution. 
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In  principle,  Number  Theoretic  Transforms  (NTT)  could 
be  implemented  in  the  same  was  as  Discrete  Fourier  Trans- 
forms with  multiplications  by  trigonometric  functions 
replaced  by  multiplications  by  powers  of  two,  all  operations 
being  performed  modulo  a Mersenne  or  Fermat  Number.  When 
the  transforms  have  a composite  number  of  terms,  as  is  the 
case  with  Fermat  Number  Transforms  (FNT)  or  some  pseudo- 
Mersenne  Transforms  [24],  various  pipeline  computing  tech- 
niques can  be  used  [25] . 

In  practice,  however,  direct  transposition  of  Fast 
Fourier  Transforms  (FFT)  architectures  does  not  necessarily 
lead  to  optimum  implementations  and  the  development  of 
special  configurations  for  computing  NTT  seems  worth  exploring. 
Along  these  lines,  McClellan  [26]  has  proposed  a new  coding 
technique  for  simplifying  the  implementation  of  Fermat 
Number  Transforms. 

McClellan  implemented  a FNT  convolver  for  radar  signal 
processing  and  reached  some  interesting  conclusions. 

Leibowitz  [27]  presents  a code  translation  which  is  mathe- 
matically simpler,  and  this  proposed  arithmetic  provides 
simpler  realizations  of  all  operations  required  to  compute 
the  FNT. 

Nussbaumer  [28]  discusses  the  implementation  of  pseudo- 
Mersenne  and  Fermat  Number  Transforms.  He  shows  that  some 
pseudo-Mersenne  Transforms  can  be  computed  efficiently  by  a 
linear  filtering  approach.  This  approach  is  extended  to 
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cover  the  case  of  Fermat  and  pseudo-Fermat  Number  Transforms 
by  using  a special  coding  scheme  for  implementing  arithmetic 
operations  in  a Fermat  Number  system.  A number  of  sugges- 
j tions  have  arisen  for  lengthening  the  sequences  which  can 

be  handled  by  the  NTT.  One  suggestion  is  to  perform  the 
calculation  modulo  several  mutually  prime  modulo,  and  then 
obtain  the  desired  result  by  using  the  Chinese  Remainder 
theorem  [13],  [29].  Reed  and  Truong  [30]  have  also  shown 
how  one  can  extend  the  method  to  Galois  fields  over  complex 
integers  modulo  Mersenne  primes  to  enable  one  to  use  the 
, FFT  algorithm  to  compute  convolutions  of  complex  sequences, 

and  to  lengthen  the  sequences  which  the  method  can  handle. 
However,  because  this  method  requires  several  multiplications, 
it  does  not  seem  very  promising. 

One  of  the  most  promising  methods  for  lengthening  the 
sequence  one  can  handle  has  been  suggested  by  Rader  [3] 
and  developed  by  Agarwal  and  BurruG  [31] . This  consisted  of 
mapping  the  one-dimensional  sequences  into  multidimensional 
sequences  and  expressing  the  convolution  as  a multidimen- 
sional convolution.  In  Appendix  A an  explanation  of  the 
* process  of  two  dimensional  convolution  for  convolving  long 

sequences  is  presented.  By  the  use  of  an  examp ie  it  is 
shown  that  this  process  improves  the  length  of  the  sequences 
handled  by  the  NTT. 

k 

With  knowledge  of  the  advantages  and  disadvantages  of 
Fermat  Number  Transforms,  it  is  possible  to  speculate  on 
just  what  type  of  problems  are  likely  to  benefit  from  this 
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new  approach.  In  general  one  looks  for  problems  which  have 
the  following  characteristics: 

1)  Fairly  short  sequences  (about  fifty  delayed  products) 

2)  A high  accuracy  requirement 

3)  Implementations  where  multiplications  are  very  much 
more  costly  than  additions. 

Two  specific  situations  come  to  mind.  The  first  is  the 
estimation  of  magnitude  spectra  of  (many  simultaneous) 
wideband  signals.  The  theory  of  power  spectra  estimation 
states  that  power  density  computations  involves  formulating 
a correlation  function  with  a finite  number  of  delays 
which  are  a fraction  of  the  number  of  data  points  available. 

The  second  situation  is  two  dimensional  finite  impulse 
response  filtering.  Here  we  may  consider  an  arbitrary  LxL 
impulse  response  to  be  applied  to  a large  image.  If  L is 
in  the  range  of  52  points,  the  FFT  is  not  particularly  attrac- 
tive for  convolution,  although  more  so  for  two  dimensional 
convolution  than  for  one  dimensional  convolution. 

The  Fermat  Number  Transform  is  quite  effective,  however. 
For  impulse  responses  of  this  size,  the  number  of  multipli- 
cations is  reduced  by  about  two  orders  of  magnitude  over  the 
direct  method,  in  exchange  for  a number  of  additions  which 
are  not  too  different  (usually  less)  than  required  for  the 
direct  method.  Thus  it  can  be  expected  that  the  Fermat 
number  transform  will  soon  play  a part  in  the  computation 
required  for  the  filtering  of  pictures. 
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Recently,  Derome  [32]  discusses  a class  of  NTT's  based 
on  three  bit  primes,  having  many  of  the  computational 
advantages  of  the  FNT's.  These  NTT's  can  have  much  larger 
transform  lengths  than  those  for  FNT's  so  that  the  fast 
convolution  of,  for  example,  a 1000  x 1000  point  picture 
with  a 24  x 24  point  spread  function  should  be  possible  in 
a minicomputer!  This  development  was  undertaken  in  connec- 
tion with  analysis  of  high  resolution  electron  micrographs. 


II.  MODULAR  ARITHMETIC 


Let  a and  b be  integers.  We  say  that  "a  divides  b" 
(denoted  by  "a|b" ) and  "b  is  multiple  of  a"  if  there  exists 
an  integer  c such  that  b = ac. 

If  a does  not  divide  b,  we  denote  the  fact  by  "a^b". 

An  important  theorem  concerning  the  division  of  integers 
(the  Division  Algorithm),  is: 

Let  a and  b be  integers,  b not  zero.  Then  there  exist 
two  unique  integers,  q and  r,  such  that 

a * bq  + r and  0 < r < b (2.1) 

The  integers  q and  r,  are  called  the  "quotient"  and  the 
"remainder"  respectively. 

If  a,  b and  M,  are  integers,  with  M > 0,  such  that 


M | (a  - b)  (2.2) 

we  say  that  "a  is  congruent  to  b,  modulus  M" , and  we  denote 
this  fact,  by  writing 

a = b (mod  M)  (2.3) 

In  other  words,  two  integers  a and  b are  congruent  mod  M, 
if  M divides  their  difference. 
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We  will  refer  to  M as  the  "modulus 


Note  that 


1)  23  = 8 (mod  5)  since  23  - 8 = 15  = 5-3 


and 


2)  23  = 3 (mod  5)  since  23  - 3 = 20  = 5-4 

are  both  true  statements. 

We  restrict  our  attention  to  what  is  called  a complete 
residue  system,  mod  M,  as  the  set  of  integers 

ZM  = (0,  1,  2,  ...,  M- 1 } . (2.4) 

In  other  words,  when  an  integer  a is  divided  by  another  M 
1*6*/ 


a = KM  + b 

where  the  remainder  b,  is  some  positive  integer  less  than 
M,  there  exists  a congruence 

a = b (mod  M)  (2.5) 

such  that  b is  a unique  integer  among  the  numbers 

0,  1,  2,  ...  M-l  . 
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Thus,  one  possible  mechanization  of  residue  reduction, 
is  to  divide  by  the  modulus  and  keep  only  the  remainder. 
Examples : 

. 4 7 

1)  47  = 2 (mod  9)  since  - 5 remainder  = 2 

81 

2)  81  H 0 (mod  27)  since  2=j  ~ 3 remainder  = 0 

Both  2 and  0 are  contained  only  once  within  the  set  of 

integers  (0,1,..., 8}  and  {0,1,2,  . ..,  26}  respectively. 

The  last  example  shows  that,  in  general  instead  of  saying 
that  a number  a is  divisible  by  the  number  M we  can  write 

a = 0 (mod  M) 

For  this  means  a - 0 = a = Mk,  where  k is  some  integer. 

For  instance,  instead  of  saying  that  a is  an  even 
number,  we  can  write 

a = 0 (mod  2) 

In  the  same  manner  one  sees  that  an  odd  number  satisfies 

a = 1 (mod  2 ) . 

In  working  with  residue  reduction  we  will  drop  the 
symbol  = for  congruence  and  will  use  the  symbol  =*. 
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But  congruence  is  not  the  same  as  equality  unless  one 
can  show  separately  that  the  difference  between  a and  b is 
also  less  than  M.  The  following  basic  arithmetic  operations 
are  permissible  with  modular  arithmetic: 


a) 

Addition: 

2 + 6 = 8 = 1 (mod  7) 

b) 

Negation: 

-2  = -2  + 7 = 5 (mod  7) 

c) 

Subtraction: 

3-5  = 3+  (-5)  = 3 + (-5  + 

7) 

= 5 (mod  7) 

d) 

Multiplication: 

3 x 6 = 18  = 4 (mod  7) 

e) 

Multiplicative 
inverse : 

Multiplicative  inverse  of  an 

integer 

b in  ZM  exists  if  and  only  if  b 
and  M are  relative  primes.  In 
that  case  b x b ^ = 1 (mod  M) 

2_1  = 4 (mod  7);  2x4  = 8 = 1 (mod  7). 

f)  Division:  a/b  exists  if  and  only  if  b has 

an  inverse.  In  that  case 

a/b  = axb  4/2  = 4x4  = 16  = 2 (mod  7). 

Note  that  because  of  the  nature  of  modular  arithmetic, 
numbers  do  not  have  sizes  or  magnitude.  One  cannot  say 
that  a particular  number  is  larger  than  another  or  that 
numbers  are  close. 

Extracting  the  residue  is  a functional  transformation 
of  a into  b.  It  occurs  often  enough  in  what  follows  that 
it  deserves  a special  symbol,  <•>  . 
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<a>M  - b (2.6) 

The  subscript  M may  be  omitted  if  it  is  understood  from 
the  context. 

Computations  involving  residues  are  usually  simple 
because  it  is  never  necessary  to  work  with  quantities 
larger  than  the  modulus  unless  one  finds  it  convenient. 
Notice  that 


<x+y>  is  the  same  as  <<x>  + <y>> 

<x-y>  is  the  same  as  <<x>  - <y>>  (2.7) 

<xy>  is  the  same  as  <<x>  <y>> 

so  that  in  any  computation  involving  only  + , x one  may, 
X 

their  option,  replace  the  result  of  any  step  by  its  residue 
Example : 

<15+13>7  = <<15>?  + <13>7>7  = <l+6>7  = 0 

<12-11>7  = <<12>?  - <11>?>  = <5  - 4>?  ■ 1 

<9  x 14>7  = <<9>7  x <14>7>  = <2  x 0>7  = 0 

Note  that  if  it  were  necessary  to  divide  for  residue 
reduction  the  operation  would  be  quite  costly. 


at 
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Fortunately  there  are  simpler  techniques.  In  the 
simplest  case  - the  residue  or  a decimal  number  modulo  10 
is  its  last  decimal  digit,  since 


<a>io  - < l ai  1()1>io  = <ai  1q1>>  (2.8) 

i i 

and  <a^  10  is  zero  except  for  i = 0 term. 

Example : 

3 

<109 8>1q  = < l a.  10i>1Q 

i=0 

= <1  x 103  + OxlO2  + 9 x 101  + 8 x 10°>1Q 

<1098>1q  = ' 8 (mod  10) 


This  can  be  generalized  to  any  radix.  If  M is  a power  of 
two,  and  a is  represented  on  a binary  machine,  one  has  a 
trivial  method  of  extracting  <a>M- 

K— 1 

<a>  K = < I a.  <2L»  = l a.  21  (2.9) 

2 i i=0 

This  operation  is  performed  by  "masking  out"  all  but  the 
K least  significant  bits. 
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Example: 


2 

<15>  . = < [ a.  <21>> 

2 

i=0 

= <1111>  ^ = 1x2°  + 1 x 21  + 1 x 22  = 7 mod  8 

2 

i.e.,  here  K = 3,  and  only  the  3 least  significant  bits 
account  for  the  value  of  the  residue.  The  fourth  bit  (the 
most  significant  bit  in  this  case)  is  "masked  out." 

A.  SOME  IMPORTANT  RESULTS  IN  MODULAR  ARITHMETIC 

Euler's  function  is  defined  as  $(M) , the  number  of 
integers  in  the  finite  set  {0,1,2,  ...  M-l)  (which  is  called 
the  set  of  integers  mod  M,  and  denoted  by  ZM)  that  are 
relatively  prime  to  M. 

For  M a prime, 

<NM)  = M - 1 (2.10) 

Example:  let  M = 7.  The  ring  of  integers  mod  7 is 

Z?  = {0,1, 2, 3, 4, 5, 6}.  Since  M = 7,  is  a prime,  the 
integers  in  that  are  relatively  prime  to  M = 7,  are  all 
elements  of  the  set  (except  zero),  i.e. 

♦(in)  = M-l 

<f>(7)  = 7-1  = 6 


2 


i=0 
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If  M is  composite  and  its  prime  factored  form  is 


denoted  by 


M = 


(2.11) 


then  the  general  expression  for  0 (M)  is,  [13] 


<fr(M)  = 


M ( 1 - -i-)  (1  - -±-) 


(1  - -±-) 
Pi 


(2.12) 


Example:  Let  M = 12  = 2 x 3 


Z12  = {0,1,2,3,4,5,6,7,8,9,10,11} 


by  simple  counting  one  finds  4 numbers  that  satisfy  the 
conditions  of  being  relatively  prime  to  M.  Checking  by 
applying  equation  (2.12) 


♦(12)  = 12(1  - 


1] 


(1  - |)  = 12(|)(|)  = 4 


An  important  theorem  known  as  Euler's  theorem  states 
that  for  every  a relative  prime  to  M 


a*(M) 


1 (mod  M) 


(2.13) 


For  M prime,  this  reduces  to  Fermat's  theorem 
aM  1 ■ 1 (mod  M) 


(2.14) 
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since  they 


which  holds  for  all  nonzero  elements  of  Z,. , 
are  all  relative  prime  to  M,  if  M is  prime  by  assumption. 
Example:  Let  M = 7 


Z?  = {0,1, 2,3,4, 5,6} 

<p(7)  = 7-1  = 6 


which  holds  for  all 

,fM.  non-zero  elements 

or  ' = a =1  (mod  M)  of  Zv  since  they 

are  #11  relative 
primes  to  M. 

Here : 

l6  = 1 (mod  7) 

26  = 64  = 1 (mod  7) 

3 6 = 729  = 1 (mod  7) 

4 6 = 4096  = 1 (mod  7) 

5 6 = 15625  = 1 (mod  7) 

6?  = 46656  = 1 (mod  7) 


There  are  certain  roots  of  unity  that  are  of  particular 
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interest.  If  N is  the  least  positive  integer  such  that 


aN  = 1 (mod  M)  (2.15) 

then  a is  said  to  be  a root  of  unity  of  order  M,  or  simply 
of  order  N. 

If  the  order  of  a is  equal  to  4>(M)  , then  a is  called  a 
primitive  root. 

If  M is  prime  and  a is  a primitive  root  the  set  of 
integers 

{aK  (mod  M) , K = 0,1,2,  ...  M-2}  (2.16) 

is  the  total  set  of  non-zero  elements  in  Z.,.  Thus  all 

•M 

nonzero  integers  in  ZM  can  be  generated  by  powers  of  a 
primitive  root.  This  characterizes  the  entire  field. 

Example : 

Let  M = 7 

Z7  = (0,1, 2, 3, 4, 5,6} 

Looking  in  a table  of  primitive  roots,  for  example  [13] 
we  get 

3 and  5 are  primitive  roots  of  7, 


thus 
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{ 3K  (mod  7),  K = 0,1,2,. ..,5  = M-2} 


= {3°,31,32,33,34,35} 


= {1,3, 2, 6, 4, 5}  mod  7 


and  those  are  all  non-zero  elements  in  Z^.  The  same  for 


{ 5K  (mod  7),  K = 0,1,2,. ..,5} 


= {5°,51,52,53,54,55} 


= {1,5, 4, 6, 2, 3}  mod  7 . 


Euler's  theorem  implies  that  if  a is  of  order  N,  then  N 
must  divide  <j>  (M)  , denoted  by 


N | <J>  (M) 


(2.17) 


If  M is  a prime  it  can  be  shown  [13],  that  roots  of 
order  N exist  if  and  only  if 


N| (M-l) 


(2.18) 


and  the  roots  are  given  by  [13], 


a = a 


M-l/N 


(2.19) 


where  a denotes  a primite  root. 


27 


Example:  Let  M = 7 <t>(7) 


M - 1 


6 


Possible  order  of  roots:  N = 1,2, 3, 6 
primitive  root  (from  table) : 3 

Order  of  roots: 


a 


= 36/3  = 3 2 


= 2 mod  7 


= t6/6  = 3 mod  7 


a = 3 


a = 2 order  3 a = 3 order  (6 

/ 

primitive  root 


More  generally,  if  a is  a root  of  order  N then  [13] 
aK  is  of  order  N/K,  if  K|N 

(2.20) 

K 

a is  of  order  N,  if  N and  K are 

relatively  primes 


Example:  Let  M = 11 


<J>  (M)  = 10 


10 


Now  a = 2 is  a primitive  root  since  2=1  mod  11  and 


N 


10  is  the  least  positive  number  such  that 


N 


ot 


,N=10 


1 mod  11 


mod  7 

2 

= <p(7)) 
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Then 


aK  = 2^  = 4 mod  1 is  a root  of  order  N ^/K_2  = 5 

This  can  be  seen  as  follows: 

N = I 0123456789  10 


4N  1459314593  1 


i.e.,  the  root  a = 4 generates  a cyclic  subset  of  the  field 

with  N = 5 distinct  elements  (order  5) . 

K 3 

For  a =2  =8  mod  11  since  N = 10  and  K = 3 are  relatively 
primes  a = 8 mod  11  will  be  a root  of  order  N = 10,  or  a 
primitive  root. 

Checking: 


N = 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

8N 

1 

8 

9 

6 

4 

10 

3 

2 

5 

7 

1 

Notice  that  (2.20)  implies  that  the  number  of  roots  of 
order  N,  is  given  by  4>  (M)  , and  therefore  the  number  of 
primitive  roots  is  $(4>(M))  (since  for  a primitive  root 
N = <p  (M) ) . 

Example:  Let  M = 7 Z7  = {0,1, 2, 3, 4, 5, 6} 

Number  of  primitive  roots  = $($(7))  ■ <|>(6)  = 6(1— j)(1-j)  = 


as  seen  previously,  3 and  5 are  primitive  root  mod  7. 


Number  of  roots  of  order  N=3:  <J>(3)  = 3 - 1 = 2. 

a = 3 6/3  = 32  = 2 mod  7 a = 56/3  = 52  = 4 mod  7 

a =2  order  3 a = 4 order  3 

So  a = 2 mod  7 and  a = 4 mod  7 roots  of  order  N = 3 . 

These  relations  will  allow  one  to  calculate  all  of  the 
roots  of  all  possible  orders  from  one  primitive  root. 

We  can  summarize  these  ideas  more  precisely  in  the 
following: 

The  highest  possible  exponent  to  which  a number  can 
belong  mod  M is  $ (M) 

(i.e.  the  highest  order  of  a root  is  N = 4>  (M)  ) 

If  a number  has  order  N = <p  (M)  , we  shall  call  it 
a primitive  root  for  the  modulus  M. 

Not  every  modul  has  primitive  roots;  for  instance, 
for  the  modulus  M = 15,  one  finds  that  every  x in 
ZM,  relatively  prime  to  15  satisfy  the  congruence 

x4  = 1 (mod  15) 

and  yet  <p  (15)  = 8. 

To  find  the  primitive  roots  of  a modul  if  they  exist, 
one  must  usually  proceed  by  trial  and  error,  although 
there  are  certain  rules  that  may  facilitate  the  search. 

Often  one  of  the  small  number  2,  3,  5 or  6 may  turn 
out  to  be  a primitive  root. 


Extensive  tables  of  primitive  roots  for  primes 
have  been  computed.  The  first  of  these,  the  "Canon 
Arithmeticus"  (1839)  by  K.G.J.  Jacobi,  included 
primitive  roots  for  all  primes  below  1,000.  More 
recent  tables  by  Kraitchick,  Cunningham,  and  others 
give  primitive  roots  for  all  primes  up  to  25,000 
and  even  beyond. 

One  interesting  point,  not  mentioned  so  far,  is  the 
existence  of  multiplicative  inverse. 

Multiplicative  inverse  of  an  integer  b in  ZM  exists  if 
and  only  if  b and  M are  relatively  primes. 

In  that  case,  b ^ is  one  integer  such  that 


v w-1 

b x b • 


1 (mod  M) 


(2.21) 


Example:  Let  M = 7 

5_1  = ? 5 x 5_1  = 1 mod  7 5_1  = 3 

Since  5 x 3 = 15  = 1 nod  7,  i.e.,  if  M is  a prime  for 
every  non-zero  integer  a,  in  ZM,  there  exists  an  inverse 
[13] 


M-2 

a 


(2.21a) 


this  can  be  seen  by  another  example. 
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Example:  M = 7 


-1  7-2  5 

1=1=  1=1  mod  7 lxl 

-1  7-2  5 

2 = 2'  = 2 = 32  = 4 mod  7 

-1  7-2  5 

3=3  = 3 = 243  = 5 mod  7 

4_1  = 47-2  = 45  = 1024  = 2 mod  7 

5'1  = 57-2  = 55  = 3125  = 3 mod  7 

-1  7-2  5 

6=6  = 6 = 7776  = 6 mod  7 

Note  that  for  a non-prime  M,  a has  an 
[13]  : 


=1  mod  7 

2 x 2_1  = 2x4  = 

3 x 3_1  = 3x5  = 

4 x 4_1  = 4x2  = 

5 x 5_1  = 5x3  = 

6 x 6_1  = 6x6  = 
inverse  given  by 


8 = 1 

15  = 

8 = 1 

15  = 

36  = 


<1>(M)  -1 
a 


(2.22) 


if  a and  M are  relatively  primes. 

Example:  N = 12  = 22  x 3 

♦ (12)  - 12(1  - £ ) (1  - - 12  <i)  (|)  - 4 


ZM-12  = <0.1,2,3,4,5,6,7,8,9,10,11) 

So  there  are  $(12)  = 4 elements  in  Z12#  (1,5,7,11)  that  are 
relatively  prime  to  M ■ 12. 


mod  7 


1 mod  7 


mod  7 


mod  7 


mod  7 
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This  suggests  that  in  this  case  b 1 = b.  Therefore,  we 
verify  that  by  considering  M a composite  rather  than  a 
prime,  one  observes  several  differences. 

If  we  define  Z,.  as  the  ring  of  integers  modulo  M, 
then  if  in  a ring  of  integers  multiplicative  inverses  exist 
for  all  non-zero  integers,  this  ring  is  called  a field. 

A field  with  a finite  number  of  elements  is  also 
called  a Galois  field.  It  is  clear  then,  that  ZM  is  a 
field  if  and  only  if  M is  a prime. 

ZM  is  not  a field  for  a non  prime  M,  since  not  all 

elements  will  have  multiplicative  inverses.  Also,  there  is 

./* 

no  primitive  root  that  will  generate  the  entire  ring,  (only 
subsets  with  <J>  (M)  elements,  at  most.) 
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Example : 


M = 8 = 2 3 


4>(M) 


4 


subset  with  4 elements 
root  of  order  N = 2 
subset  with  3 elements 
root  of  order  N = 2 
subset  with  4 elements 
root  of  order  2 


In  these  conditions,  the  previous  example  shows,  that 
there  are  multiplicative  inverses,  only  for  those  elements 
in  Zg  relative  primes  to  M = 8. 

We  can  investigate,  a little  more,  the  arithmetic 
modulo  M,  when  M is  not  a prim6.  Let  M have  the  following 
unique  prime  power  factorization  (2.11) 


when  the  arithmetic  is  done  mod  M,  it  is  in  effect  done 
modulo  each  prime  power  p^  simultaneously  [13] . 

A set  of  arithmetic  operations  can  be  done  either 
modulo  each  p^^  separately  and  the  final  result  mod  M 
obtained  using  the  Chinese  remainder  theorem  [13],  or 
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alternatively  all  the  operations  may  be  done  mod  M,  but 

ri 

they  must  be  valid  operations  mod  for  each  p^ 

In  these  conditions,  an  integer  a is  said  to  be  of 
order  N in  ZM  if  and  only  if  it  is  of  order  N in  each 


Here  we  present  some  basic  results. 


a = b mod  M 


is  true  if  and  only  if 


a = b mod  p^ 


i = 1,2, ...  £ (2.23) 


If  we  know  the  residues  of  an  integer  a modulo  each  p^ 

We  can  uniquely  reconstruct  the  integer  a mod  M using  the 
Chinese  Remainder  Theorem. 

To  establish  this  theorem,  let 


a = ai  mod  p^^  1 
dL  = M/(Pi  i) 


(2.24) 


(2.25) 


-1  ^ (d.  mod  p.  i)_1  mod  p.  L 


(2.26) 
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then 


P 

a = ( I d^  1 a^)  mod  M 

i=l 

Example : 

Let  a = 123  and  M = 24  = 23  x 3 

Calculation  of  a^s: 

a = 123  = a^  mod  2 3 a ^ = 3 mod  8 

a = 123  = a2  mod  3 a2  = 0 mod  3 

Calculation  of  d^s: 

rl  3 

dl  = M/pl  = 24/2  = 3 mod  8 

r2 

d2  = M/p2  z = 24/3  =8=2  mod  3 

-1 

Calculation  of  d^'s: 

dj-1  = (dx  mod  px  1)~1  = (3  mod  8)_1  = 3<<)(8,“1  = 

d2_1  ^ (d2  mod  p2  2)_1  = (2  mod  3)'1  = 2M02  = 23 

then 


(2.27) 


33  = 3 mod  8 


= 2 mod  3 
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2 


a = ( l cL  cL  ^ a^)  mod  M 

i=l 

a = (3x3x3)  + (2x2x0)  = 27  = 3 mod  24 


Check : 


a = 123  = 3 mod  24. 
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III.  TRANSFORMS  IN  FINITE  FIELDS 


The  basic  operations  of  signal  filtering  is  convolution. 
In  the  discrete  time  signal  processing  situation,  convolution 
takes  the  form 

00 

y(n)  = x(n)  * h(n)  = l x(m)  h(n-m)  (3.1) 

n=-°° 

n - 0 , 1,  2,  ... 

Where  h(n)  is  the  response  to  a unit  impulse,  for  a causal 
filter,  then 


h (n)  - 0 for  n < 0 

and,  when  the  duration  of  the  impulse  response  is  finite, 
the  infinite  sum  (3.1)  reduces  to  a finite  sum 

N-l 

y(n)  = l x(m)  h(n-m)  (3.2) 

m=0 

where  N is  the  length  of  the  finite  impulse  response 
(FIR)  filter. 

The  processing  workload  required  to  evaluate  (3.2) 
can  be  reduced  significantly  if  direct  computation  is 
replaced  by  transforms  methods.  This  is  so  provided  the 
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application  in  question  allows  sequences  to  be  processed 
in  blocks. 

The  Discrete-Fourier  Transform  (DFT)  is  one  of  the  most 
versatile  transforms,  and  is  defined  by 


« A . . II  » __ 

A -](— )nK 

DFT  = X (K)  = l x (n)  e N (3.3) 

n=0 

K = 0,  1,  ...  N-l 


and  its  inverse  transform  by 


N-l  . . 2tt . „ 

a l j(-Tj-)nK 

IDFT  = x (n)  = ± l X (K)  e 

K=0 

m = 0 , 1 , ...  N-l 


(3.4) 


where  N is  the  length  of  the  sequence  which  transforms  one 
wants  to  calculate. 

In  order  to  use  the  DFT  in  the  high-speed  implementation 
of  convolution,  one  makes  use  of  its  cyclic  convolution 
property,  which  states  that 

DFT [h (n)  * x (n) ] = DFT [h (n) ] • DFT [x  (n) ] (3.5) 


and  this  implies  that  convolution  can  be  implemented  using 
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y (n) 


IDFT{DFT [h (n) ] • DFT [x (n) ] } 


(3.6) 


The  convolution  implemented  by  (3.6)  is  called  cyclic  convo- 
lution since  it  evaluates  (3.2)  as  if  h(n)  and  x(n)  were 
periodically  extended  outside  the  range  [0,N-1],  or  equiva- 
lently, the  indices  were  evaluated  modulo  N.  Notice  that 
normal  finite  convolution  can  be  calculated  by  cyclic 
convolution  if  zeros  are  apended  to  x(n)  and  h(n)  to  prevent 
folding  or  aliasing  [34] . 

The  DFT  is  a transform  of  finite  sequences,  of  real  or 
complex  numbers.  One  might  ask  if  there  are  transforms  in 
other  number  fields.  That  is,  given  a sequence  of  numbers 
modulo  M,  is  there  a transform  that  has  the  cyclic  convolu- 
tion property?  Ohe  will  see  that  the  answer  to  this  question 
is  yes. 

If  one  has  a sequence  of  numbers  of  length  N,  then  a 
transform  of  the  form  given  by  the  pair 

N-l 

X (K)  = l x (n)  anK  (3.7) 

n=0 

N-l 

x (n)  = | l X (K)  a_nK  (3.7a) 

K=0 

is  said  to  have  a DFT  structure  [34] . 
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If  one  looks  for  the  properties  that  a general  trans- 
form (3.1)  having  the  DFT  structure  must  have  to  exhibit 
the  cyclic  convolution  property,  one  finds  [4] , [35]  , that 
it  depends  on  the  existence  of  an  a that  is  a root  of  unity 
of  order  N,  i.e. , 

N 

a = 1 (3.8) 

and  that  N ^ ^ exists,  i.e.,  the  inverse  of  N exists. 

It  has  been  shown  [35]  that  in2^he  complex  number  field  the 

. • ^ N 

conventional  DFT  with  a = e is  the  only  transform,  with 

that  property.  However,  by  working  in  a finite  field  or 
ring  of  integers  with  arithmetic  carried  out  an  integer 
M,  a large  class  of  transforms  exist  [14]  that  have  the 
cyclic  convolution  property.  By  special  choices  of  the 
length  N,  the  modulo  M,  and  the  value  a,  it  is  possible  to 
have  transforms  with  many  interesting  properties.  These 
are  called  number  theoretic  transforms. 

The  possibilities  of  interest  are,  [14]: 

1)  the  ring  of  integers  Z^r  with  respect  to  modulo  M 

2)  the  field  of  integers  GF(p),  with  respect  to  prime 
modulus  p 

3)  the  Galois  field  GF(p^)  of  p^  elements. 

Let  ZM  represent  the  ring  of  integers  (0,1  ...  M-l}  (2.4) 
with  arithmetic  carried  out  mod  M.  Let  M have  the  following 
unique  prime  power  factorization: 
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M 


(2.11) 


where  the  pVs  are  distinct  primes. 

As  pointed  out  previously  (section  II) , when  one  carries 

arithmetic  mod  M,  one  is  in  effect  doing  it  modulo  each 
r . 

p^  1 simultaneously.  Therefore,  the  length  N number  theoretic 
transform,  having  the  cyclic  convolution  property  in  ZM, 
must  also  have  that  property  in 


Z r . for  i = 1 , 2 , ...  i . 

P.1 

l 

r . 

This  requires  that  a (mod  p^  ; , an  integer  of  order  N 
must  exist  in  Zr.  , i.e.,  N is  the  least  positive  integer 

Pi1 

such  that 


N 

a 


r . 

1 (mod  pi  1) , i = 1,2, 


a (3.9) 


3 

Example:  Let  M = 24  = 2 x 3 

then 


Z 

2 


3 


{0,1, 2, 3, 4, 5, 6, 7} 


and 


Z3  = {0,1,2} 
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For  (3.9)  to  be  satisfied,  since 


i)  for  Z. 


N 

0 

1 

2 

1N 

1 

1 

1 

1 

o 

1 

order  2 


(3.10) 

, i.e.,  a = 1 order  N = 1 
a = 2 order  N = 2 


and 


ii)  for  Z 


1 

2 


(3.11) 


1 1 1 1 1 1 1 1 
12400000 

i.e.,  a = 1 order  N ='  1 


Then,  a = 1 (mod  24)  is  a root  of  order  N = 1,  thus  the 

length  of  the  number  theoretic  transform  possible  in  the 

ring  Z24/  is  N = 1 (not  a very  interesting  result!). 

Furthermore,  since  the  inverse  transform  requires  the 

existence  of  the  inverse  of  N,  i.e.,  N 1,  this  number  should 

exist  in  Zj,  , or  N should  be  relatively  prime  to  M (2.21). 
Pi1 

In  this  example, 

(i)  for  Z, 
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one  wants 


= ? 


T 


By  (2.21a) 


N-1  = l"1  = l3-2  = l1  = 1 mod  3 


(ii)  for  Zt 


also  required  is  N-1  = 1_1  = ? 


By  (2.22) 


N-1  = l"1  (mod  8)  = 1*(M)_1 


but 


<J>  (M=8) 


8(1  " 


= 4 


then 


N_1  = l"1  (mod  8) 


= l4"1  = l3  = 1 (mod  8) 


Thus  we  have  verified  the  existence  of  a number  theoretic 
transform  of  length  N = 1 , in  the  ring  of  integers  Z24- 
In  general,  the  existence  of  an  of  order  N,  each  Z^ri 
can  be  investigated  recalling  Euler's  theorem  (2.13). 

That  is,  by  observing  (3.4)  and  Euler's  theorem,  one  has 
the  condition 
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i — 1/  2 , • • . , 


(3.12) 


N|<Mpi  x) 

divide  4>  (p^  1)  . 

Let  M = 11 x 31  = 341 

Z11  = {0'1'2'  10} 

and 

Z31  = t0,;L'2'  •••'  3°} 

i)  * for  Z11 

(j>(ll)  = 11-1  = 10  , implies  possible  values  of 

order  N for  the  roots  in 
Zu  (1,2,5,10) 

ii)  for  Z^i 

<J>(31)  = 31-1  = 30  , implies  possible  values  of 

order  N for  the  roots  in 
Z31  (1,2,3,5,6,10,30) 

That  is,  for  (3.12)  to  be  satisfied,  in  the  conditions  given 
by  i)  and  ii) , the  possible  values  for  the  length  of  the 
NTT  in  Z341  are  (1,2,5,10). 

45 


i£. , N should 

Example : 

Then 


Notice  that  by  (3.9),  whatever  the  root  in  consideration, 
it  has  to  be  the  same  order  in  as  that  in  Z^. 

For  Z^,  one  constructs  the  following  table. 


root  of  order  1 
10 
5 
5 
5 

10 
10 
10 
5 

root  of  order  5 


(3.12a) 


For  a s-milar  table  is  constructed,  from  which  only 

a part  is  shown  (notice  that  only  N = 1,  2,  5,  10  are  of 
interest) . 


N 

0 

1 

2 

5 

10 

1N 

1 

1 

1 

1 

1 

root 

of 

order 

1 

2N 

1 

2 

4 

1 

1 

root 

of 

order 

5 

3N 

1 

3 

9 

26 

25 

root 

of 

order 

30 

4N 

1 

4 

16 

1 

1 

root 

of 

order 

5 

5N 

1 

5 

25 

ioN 

1 

30 

1 

30 

root 

of 

order 

2 
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Thus  the  root  a 


= 4 is  of  order  5 in  both  Z^  and 
and  so  the  length  of  the  NTT  in  the  ring  Z34  can  be  equal 
to  5.  Furthermore,  since  the  multiplicative  inverse  of  an 
integer  b in  ZM  exists  if  and  only  if  b and  M are  relatively 
prime,  and  for  the  inverse  transform  one  requires  N 3,  N 
should  be  relatively  prime  to  M (or  p^'s). 

In  the  last  example: 

N = 5 is  relative  prime  to  N = 243  so  5 3 mod  341 
exists  and  is  given  by 

5MM)-1  = 5*<341)-1  mod  341 

but 

<{.(341)  = 341  (1  - IQ-)  (1  - 3T>  = 300 

so 

5-1  mod  341  = 5300"1  mod  341  = 5299  mod  341 

To  find  this  number  notice  that 

299  = 256  + 32  + 8 + 2 + 1 = 28  + 25  + 23  + 21  + 2° 


so 


5299  = 5256.532.58.52.5 
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But 


5 2 = 25  mod  341 

5 4 = 284  mod  341 

5 8 = 180  mod  341 


516 

= 5 mod  341 

532 

= 25  mod  341 

564 

= 284  mod  341 

5128 

= 180  mod  341 

j-256 

= 5 mod  341 

Then 

299 

5 mod  341  = (5  - 25  ♦ 180  -25-5)  mod  341 

= 273  mod  341 

Check : 

(5  x 273)  mod  341  = 1 mod  341. 

So 

5-1  mod  341  = 273  mod  341. 
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That  is,  it  is  verified  that  a number  theoretic  transform 
of  length  N = 5 exists  in  %24l'  Now»  the  condition  N 
relative  prime  to  M (or  p^'s)  means 


N | (pi  - 1)  , i = 1,  2,  . . . , Z (3.13) 


i . e . 

N | gcd  {px-l,  p2-1,  •••/  _1 > • 


0 (M)  is  defined  as  the  greatest  common  divisor  (gcd) 
of  the  (p.^  - 1) 

0(M)  = gcd  (p^l,  P2-1/  •••»  (3.14) 


Therefore 


N | 0 (M)  (3.15) 

This  last  equation  gives  the  necessary  condition  for  the 
existence  of  a transform  of  length  N in  the  ring  ZN  with 
arithmetic  carried  out  an  integer  M. 

Example:  Let  M = 11  x 31  = 341 

0 (M)  = gcd{ 11-1 , 31-1 } = gcd { 10 , 30 } = 10 
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Notive  that  N = 5 satisfies  the  condition  (3.15)  since 


N = 5 1 (O(M)  = 10.) 

It  remains  to  be  investigated  further  whether  or  not  a 
given  a of  order  N = 10  exists  in  both  Z^  and  Zg^. 

Now,  consider  the  converse  of  condition  (3.15).  If 

ri  r. 

N|0(M)  or  N | <p  (p^  ),  then  there  exists  integers  (mod  ) 

or  oder  N in  Z r [13]  . 

Pi1  r. 

Using  these  cu's  one  can  construct  transform  (mod  p^  1) 

which  have  the  DFT  structure  (3.7)  and  are  invertible. 

Combining  these  transforms  by  the  Chinese  remainder 

theorem  (2.27),  one  can  obtain  a transform  (mod  M)  having 

the  cyclic  convolution  property  in  ZM-  Alternatively  one 

can  combine  the  ' s by  the  Chinese  remainder  theorem  to 

obtain  an  a (mod  M)  of  order  N in  Zw  and  construct  the 

M 

final  transform  using  this  a.  The  results  will  be  identical. 
Example:  Let  M = 5 x 17  = 85 


4>  ( 5 ) = 5-1  = 4 possible  values  of  N:  1,2,4 


( 17 ) = 17-1  = 16  possible  values  of  N:  1,2,4, 


If  one  looks  for  an  a of  order  4 in  Zg^,  by  constructing 
tables  for  Z5  and  Z^7  as  those  in  (3.12  a,b) , one  finds 


1 


i 


8,16 
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2 mod  5 (order  4,  in  Z ,.) 


a2  = 4 mod  17  (order  4,  in  1^) 

Using  the  Chinese  remainder  theorem  (2.27) : 
(i)  d^'s  calculation 


dx  = (5  x 1 7 ) / 5 = 17  mod  5 = 2 mod  5 


d2  = (5  x 17)/17  = 5 mod  17 

-1 

(ii)  Calculation  of  d^  's. 


dx  = (2  mod  5)'1  = 25*2  mod  5 = 23  = 3 mod  5 (by  2.21a) 

i 

d2_1  A (5  mod  17) _1  = 5M_2  = 517-2  = 515  = 7 mod  17  (by  2.21a) 
Then  (by  (2.27) 


a = ( (2  x 2 x 3)  + (4  x 5 x 7)  ) mod  85 


a = (12  + 140)  mod  85 


a = 152  mod  85  = 67  mod  85 


Check : 


mod  85 
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Notice  that  a 


a = 64 
To 
N = 4, 


By  (2. 


So 


Since 


But 


= 64  mod  5 is  of  order  4 in  Z,.,  and  also 
mod  17  is  of  order  4 in  Z^. 

establish  the  existence  of  a NTT  in  Zg,.,  with  length 
one  has  to  find  the  inverse  of  N - 4 \ i.e.. 


22) 


4-l  = 4 4>  (85)  -1  and  ^(85)  = 85  (1  _ 1)  (1  _ _1_)  = 


4 1 = 464-1  = 463  mod  85. 


63  = 32+16+8+4+2+1 


,63 


= 4 


32 


16 


4 1 = 4 mod  85 

4 2 = 16  mod  85 

4 4 = 1 mod  85 


64 
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1 mod  85 


1 mod  85 

1 mod  85 

so 

4_1  = 46 3 = 1 • 1 * 1 ' 1 • 16 • 4 = 64  mod  85 


416  = 
432  = 


Check : 


4x64  = 1 mod  8 5 . 

The  fact  that  condition  (3.15)  holds  as  well  as  its 
converse  [13]  means  that  (3.15)  is  the  necessary  and 
sufficient  condition  for  the  existence  of  an  invertible 
transform  of  length  N which  has  the  cyclic  convolution 
property,  mod  M. 

This  also  establishes  that  the  maximum  transform  length 
in  ZM  is 


N = 0(M)  (3.16) 

max 

Notice  that  for  a given  modulus  one  knows  exactly  what  are 
the  possible  transform  lengths  in  Z 

For  any  NTT  to  be  computationally  efficient,  there  are 
three  main  requirements  [17]  : 
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(i)  N,  the  transform  length,  should  be  highly  composite 
(preferably  a power  of  2)  for  an  FFT-type  algorithm 
to  exist,  and  should  be  large  enough  for  practical 
sequence  lengths. 

(ii)  The  multiplication  by  powers  of  u(3.7)  must  be  a 

a simple  operation.  This  is  possible  if  the  powers 
of  a have  a binary  representation  with  few  bits. 

(iii)  To  simplify  the  arithmetic  modulo  M,  M should  have 
a binary  representation  with  very  few  bits  and 
be  large  enough  to  prevent  overflow. 

Although  the  class  of  all  possible  number  theoretic  trans- 
forms seems  very  large  at  first  consideration,  closer 
examination  shows  that  very  few  seem  to  satisfy  the  afore- 

• p 

mentioned  criteria.  The  parameters  that  must  be  chosen  are 
M,  N and  a. 

Briefly,  one  verifies  that  if  M is  even,  it  has  a factor 

of  2 and,  therefore,  0(M)  and  N are  1,  which  implies 

max  r 

M should  be  odd.  If  M is  prime  then  0(M)  = M-l  which  is  as 
large  as  one  could  hope  for  in  a field  of  M integers. 

In  section  IV,  the  various  types  of  NTT  are  discussed, 
but  one  can  say  that  conditions  ( (3. 8) , (3. 16) ) do  not  give 
a systematic  way  of  determining  the  "best"  choices.  As  a 
result  one  must  use  intuition,  insight  and  a bit  of  searching. 
Usually  M is  selected  and  the  resulting  possible  N and  a 
are  then  examined. 
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IV.  NUMBER  THEORETIC  TRANSFORMS 


One  computational  need  in  the  digital  processing  of 
signals  is  the  evaluation  of  the  circular  convolution 
summation  of  two  sequences  of  length  N.  That  is,  the 
evaluation  of 

N-l 

y(n)  = l x(n-m)  h (m)  (4.1) 

m=0 

n = 0,  1,  ...  N-l 

The  so-called  fast  convolution  procedure  obtains  this 
sum  by  taking  the  inverse  transform  of  the  product  of  the 
transforms  of  the  two  sequences.  if  the  transform  used  is 
the  discrete  Fourier  transform,  then  error-free  results 
are  obtained  only  if  infinite  precision  arithmetic  is 
assumed.  This  is  true,  even  if  both  sequences  are  composed 
of  finite  precision  numbers  because  the  Fourier  transform 
involves  the  irrational  number  exp  [- j (2tt/N)  ] . 

One  way  to  avoid  the  round-off  errors  induced  by  the 
transform  is  to  make  use  of  the  fast  transforms  over  the 
finite  field  [14] . 

By  working  in  a finite  field  or  ring  of  integers  with 
arithmetic  carried  out  modulo  an  integer  M,  a large  class 
of  transforms  exist  that  have  the  cyclic  convolution  property 
(3.5).  By  special  choices  of  the  length  N,  the  mod  M,  and 
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the  value  of  a,  it  is  possible  to  have  transforms  that  need 
only  word  shifts  and  additions  but  no  multiplications,  that 
have  an  FFT  type  fast  algorithm,  that  do  not  require 
storage  of  complex  values  of  a,  and  that  have  no  roundoff 
errors.  These  transforms  are  the  Number  Theoretic  Trans- 
forms (NTT)  and  they  look  very  promising  in  the  evaluation 
of  finite  convolutions.  Their  main  disadvantage  seems  to 
be  a relation  of  the  sequence  length  N to  the  required  word 
length  that  can  require  long  word  lengths  for  long  sequences. 

This  section  begins  by  discussing  Mersenne  and  Fermat 
number  transforms,  that  proceeds  historically  all  subse- 
quent work  in  this  field.  In  part  B,  other  number  theoretic 
transforms  that  provide  more  choices  of  word  lengths  and 
transform  lengths  than  Mersenne  and  Fermat  number  transforms 
are  discussed. 

A.  MERSENNE  AND  FERMAT  NUMBER  TRANSFORMS 

Rader  [3]  suggests  performing  the  calculations  of  a 
transform  with  the  DFT  structure  (3.7),  in  the  ring  of 
integers  modulo  a Mersenne  number.  Such  numbers  are  defined 
by  [3] 


p = 2q  - 1 (4.2) 

where  q is  prime,  but  p is  not  necessarily  prime. 

In  the  ring  p = 2q-l,  a = 2 is  a root  of  the  q*"*1  order 
since 
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(2q  - 2q  + 1) 


1 mod  2q -1  (4.3) 


aq  = 2q  = 


Under  these  conditions,  a Mersenne  transform  of  an 
integer  sequence  (an)  having  q terms  is  defined  by  [3] 


q-i 

Ak  = ( l an  2nK)  mod  p (4.4) 

n=0 

K = 0,1  ...  q . 

Because  q has  an  inverse  modulo  p [3],  the  inverse  Mersenne 
transform  will  be 

q-1 

am  = (q  1 \ 2 mK)  mod  p (4.5) 

K=0 

m = 0 , 1 ...  q-1 

where  all  exponents  and  indices  being  taken  modulo  q and 
all  operations  being  performed  modulo  p in  both  (4.4)  and 
(4.5)  . 

It  can  be  demonstrated  [3]  that  the  Mersenne  transform 
satisfies  the  convolution  theorem;  that  is  to  say,  if 
{ X„ } is  the  Mersenne  transform  of  {x  },  then  with  Z,  = A ‘X,, 

k n K IS.  fv 

the  Inverse  Mersenne  transform  {Z  } of  (Z  } is  given  by 

m x\ 

q-1 

Z = ( l ax)  mod  p ( 

m n m-  n 

n=0 


mod  p. 
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If  (an>  and  (xn>  are  Properly  bounded  [3] , Zm  is  equal 
to  the  output  of  the  ordinary  cyclic  convolution  with 


Z = Tax 
m L n m-n 


(4.7) 


Under  these  conditions,  digital  filtering  of  real  integer 
sequences  can  be  performed  by  dividing  the  sequences  into 
blocks,  padding  the  blocks  with  zeros  [34]  to  prevent  folding, 
and  aliasing  and  computing  the  cyclic  convolutions  by  means 
of  Mersenne  transforms. 

The  number  of  transform  terms  can  be  extended  to  2q, 
since  a = -2  mod  p is  a root  of  order  2q: 


2q 


>2q  = (-2q) 2 = (~2q  + 2q  - 1) 2 = (-1)2  = 1 mod  2q-l 


(4.8) 


Example : 


Tnen 


Let  q = 7 


<3  7 

p = 2-1  = 2-1  = 127 


a = -2  mod  127  = (-2  + 127)  = 125  mod  127 


Notice  that  2q  = 14,  so 
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Now, 

14  = 23  + 22  + 21  = 8+4+2 


Thus 


12514  = 1258  • 1254  • 1252 


but 

1252  = 4 mod  127 

1254  = 16  mod  127 

1258  = 2 mod  127 

and 


12514  = 2 * 16  • 4 mod  127 

= 128  mod  127 

12514  = 1 mod  127. 

Notice  also  that  the  inverse  of  2q  = 14,  mod  127  is 

a_1  - 14_1  mod  127  = 14127'2  = 14125  mod  127  (by  2.21a). 
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By  performing  some  simple  calculations. 


14  1 = 118  mod  127.  Check:  14  x 14_1  = 14 x 118  = 1 mod  127. 

Thus,  a Mersenne  transform  exists  which  has  the  cyclic 
convolution  property  for  sequences  of  length  N = 2q,  with 
ol  = -2  replacing  exp[-2Trj/N]  as  the  Nth  primitive  root  of 
unity  in  (3.3),  and  with  all  calculations  in  (3.7)  done  in 
arithmetic  modulo  p. 

Rader  advocated  such  a transform  since  using  a = 2 
or  a = -2  as  a root  of  unity  would  necessitate  only  shift 
and  add  operations  in  computing  the  transform  (3.7). 

Because  Mersenne  transforms  are  evaluated  without 
multiplications,  computation  of  a time-invariant  circular 
convolution  (4.1)  having  N = q points  reduces  to  one  multi- 
plication per  output  sample,  as  opposed  to  q multiplications 
with  direct  calculation.  To  compute  a FFT  of  a sequence 
of  length  N = q requires  of  the  order  of  (N/2)  log2<N/2) 
complex  multiplications  [17] . 

The  main  limitations  of  Mersenne  transform  approach  are 
related  to  the  fact  that  the  number  of  transforms  terms 
q (or  2q)  is  not  highly  composite,  since  q is  a prime.  This 
means  that  calculations  of  the  transforms  cannot  be  simpli- 
fied by  an  FFT- type  algorithm. 

K K 

If  one  considers  p = 2 +1  and  K odd,  3 divides  (2  + 1) 

and  the  largest  possible  transform  is  2,  thus  one  considers 
only  K even. 
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Let  K 


one  has 


t K 

= s 2 , s odd,  t an  integer.  Then  since  p = 2 + 1, 


2s2  + 1 


2+1 


= n , n an  integer. 


(4.9) 


And  the  length  of  possible  transforms  will  be  governed  by 

2fc 

the  length  possible  for  2 +1  (see,  3.15).  Therefore 

integers  of  the  form 


p = 2"  + 1 


(4.11) 


t = 0,1, 2, 3... 

are  of  interest.  These  numbers  are  called  Fermat  numbers  and 
are  defined  by  Ffc  = p in  (4.11).  For  t=0  to  t = 4 the  Fermat  numbers 
are  prime.  For  t > 4 there  are  no  known  Fermat  number  prime. 

Number  theoretic  transforms  with  a Fermat  number  as 
the  modulus,  are  called  Fermat  number  transforms  (FNT) . 

By  (3.15),  for  the  FNT  of  length  N to  exist  N must 
divide  0 (Ffc  = p) . 

Notice  that  (by  3.16),  for  Ffc  prime 


"max  - 0(Ft>  * ^ ' b * 2t 


(4.12) 


and  one  can  have  FNT,  for  any  length 


N 


2m, 


m < b 


(4.13) 
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Example : 


Let  p 


17 


2 4 

= F 2 = 2+1  = 2H  + 1 = 

If  one  constructs  the  following  table: 


N 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

order 

1 

1N 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

order 

8 

2N 

1 

2 

4 

8 

16 

15 

13 

9 

1 

2 

4 

8 

16 

15 

13 

9 

1 

order 

16 

3N 

1 

3 

9 

10 

13 

5 

15 

11 

16 

14 

8 

7 

4 

12 

2 

6 

1 

order 

4 

4N 

1 

4 

16 

13 

1 

4 

16 

13 

1 

4 

16 

13 

1 

4 

16 

13 

1 

order 

16 

5N 

1 

5 

8 

6 

13 

14 

2 

10 

16 

12 

9 

11 

4 

3 

15 

7 

1 

order 

16 

6N 

1 

6 

2 

12 

4 

7 

8 

14 

16 

11 

15 

5 

13 

10 

9 
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(4.13a) 


one  sees  that  (3,5,6,7,10,11,12,14)  are  primitive  roots 


that  will  generate  the  entire  field  Z ^ . The  root  a = 2 

2 2 

is  of  order  8 and  a =2  = 4 is  of  order  8/2  = 4 (by  2.20). 

2 

Also  note  that  11  = /2 , in  the  sense  that  11  =2  mod  17. 

That  is,  for  the  ring  Z one  has  the  possibility  of 
choosing  a FNT  of  lengths  (1,2,4,8,16)  thus  satisfying 
(4.12)  and  (4.13). 

For  digital  filtering  applications  the  composites 
Fj.  (b  = 32)  and  F^  (b  = 64)  seem  to  be  practical  [4]  . 

Lucas  [6]  has  proven  that  every  prime  factor  of  a composite 
Ffc,  is  of  the  form 

K 2t+2  + 1 (4.14) 

t+2 

Therefore,  2 divides  0 (Ffc) , for  t > 4.  ' 

2^ 

Example:  Consider  F^  = 2 +1=4  294  967  297 

= 641 x 6 700  417 

0(F5)  = g.c.d. { (641-1) , (6700  517-1)  } 

by 


0(F5)  = g.c.d. {640,  6 700  416} 

640  = 27  • 5 


6 700  416 


17449 
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Then 


0 (F5)  = 27  = 128  = 25+2  where  t = 5. 

Therefore,  for  the  choice  of  Fermat  numbers  with  t > 4, 
the  maximum  possible  transform  length  is  given  by  (see,  3.16) 

N =0  (F  ) = 2t+2  = 22  • 2t  = 4b  (4.15) 

max  t 

where 

t > 4,  b = 2t  . 

Agarwal  [4]  proved  that 

a = 2b/4(2b/2  - 1)  (4.16) 

is  a root  of  order  4b,  in  Z„  , t > 2. 

Ft 

Notice  that 

a2  = [2b/4(2b/2  - l)]2  = 2b/2(2b/2  - l)2 

a2  = 2b/2(2b  - 2-2b/2  +1)  = 2b/2(-2  • 2b/2) 
a2  = (-2)  2b 

Thus 
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T 


a2  = 2 mod  2b  + 1. 


Since  a = 2 mod  F^_ , the  notation  a = /2  for  (4.16) 
will  be  adopted  in  this  thesis  following  a general  procedure. 

Also,  note  that  any  odd  power  of  /2  will  also  be  of 
order  2t+2  (by  2.20),  i.e. 


ad  = /3d  , 


d odd 


(4.17) 


t+  2 

is  a root  of  order  2 . And  raising  a - v2  (4.16)  to 

2^t+2  m)  pQwer  one  obtains  an  integer  a'  of  order  2m, 


m < t+2,  i.e. 


(t+2-m) 


= a 


a'  being  a root  of  order  2m,  m < t+2. 


(4.18) 


Example : 


2 4 

Let  p = F2  = 2 +1  = 2+1  = 17 


i.e. 


t 2 

b = 2 = 2 = 4 


Then 


a = 2b/4(2b/2-l)  = 24/4(24/2-l)  = 2(22-l) 


a = 2(3)  = 6 mod  17 
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is  a root  of  order  4b  = 4*4  = 16.  Observing  (4.13a)  one 
sees  that  this  result  is  correct. 

Notice  also  in  (4.13a)  that  6 = /2  in  the  sense  that 

2 

5 6=2  mod  17.  So 

I 

a = /2  = 6 mod  17. 

If  one  raises  this  a to  2 , m = 3 <.  t+2  = 4 

i.e. 

2 ( 2+2- 3 ) = 2 

a2  = 62  = 2 mod  17 

and  a = 2 is  a root  of  order  2m  = 2 2 = 8 (4.18) 

Observing  (4.13a)  one  verifies  that  this  is  also  a 
correct  result. 

If  one  raises  a = 6 to  an  odd  power  say  d = 5, 

5 5 

a =6  =7  mod  17  is  a root  of  the  same  order  as  a = 6 , 

* namely  of  order  4b  = 16,  verifying  (4.17).  Observing  (4.13a) 

one  verifies  that  this  is  correct. 

Notice  that  a = 2 mod  (2°+l)  is  of  order  N = 2b,  since 

f 22b  = 22‘2t  = 22‘+1 

but 
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\ 


L 


(4.19) 


22b  = ( 2b) 2 = (-1)2  = 1 mod  (2b  + 1) 

Thus  for  FNT's  with  a prime  or  composite  modulus  one  sees 
that  a = 2 (or  a power  of  2)  is  a possible  root  of  order 
N = 2b  = 2 • 2 b = 2 t+1 . 

This  means  that  sequences  up  to  a length  N = 2b,  in 
this  case,  can  make  use  of  FNT . 

This  is  a very  desirable  situation,  since  N = 2t+'*' 
is  highly  composite  allowing  an  FFT  type  algorithm  and  all 
multiplication  by  powers  of  a are  simple  word  shifts. 

i.  i Q 

If  ot  = /2  is  used  then  sequences  of  length  N = 4b  = 2^ 

are  possible  (4.16).  Recalling  that  Fermat  numbers  up  to 

F4  are  prime  0(F(t))  = 2b  (by  3.14),  and  in  these  cases  one 

can  have  an  FNT  fof  any  length  N = 2m,  m £ b. 

Notice  that,  for  these  Fermat  numbers,  a = 3,  is  a 

root  of  order  N = 2b  (see  4.13a),  allowing  the  largest 

possible  length,  in  the  correspondnet  ring  Z . 

Ft 

The  following  table  shows  some  parameters  for  several 
possible  implementations  for  FNT's: 


t 

b = 2t 

F = 2b+l 

N = 2b 
(a  - 2) 

N = 4b 
(u  = Jl) 

N 

max 

a for 

Vx 

2 

4 

24  + 1 

8 

16 

16 

3, 5, 6, 7, 
10,11,12,14 

3 

8 

28  + 1 

16 

32 

256=2b 

3 

4 

16 

2b(*  + 1 

32 

64 

65536=2b 

3 

5 

32 

222  + 1 

64 

128 

128 

/2 

6 

64 

284  + 1 

128 

256 

256 

S2 
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That  is,  a = /2  and  the  resulting  N = 4b  give  the  maximum 
length  possible  for  and  Fg . However,  for  prime  Ffc, 

L. 

further  increases  in  N are  possible  up  to  N = 2°  if  more 
stages  of  the  FFT  algorithm  are  allowed  to  have  multiplica- 
tions rather  than  simple  word  shifts. 

Notice  also  that  besides  a = 3 being  of  order  N = 2b 
there  are  2 - 1 other  integers  of  order  2°,  since: 

<p  (M)  = (2b  + 1)  - 1 = 2b, 

and  the  number  of  primitive  roots  in  Zp  is  given  by  (see 

t 

section  II) 

<M«|»(M))  = 2b(l  - |)  = 2b_1  . 

For  F ^ = 17,  one  sees  that  the  number  of  primitive 
roots  is 

24"1  = 8 (3,5,6,7,10,11,12,14)  . 

The  cyclic  nature  of  modular  arithmetic  means  that,  with- 
out a priori  knowledge,  integers  cannot  be  associated  with 
magnitude.  For  example,  the  days  of  the  week  represent  a 
modulo  7 system,  so  that  the  statement  "Friday  is  after 
Wednesday"  has  no  meaning  unless  the  week  for  the  Friday 
and  Wednesday  in  question  is  specified. 
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This  means  that  any  number  theoretic  transform  unlike 
the  DFT  has  no  physical  significance  like  "frequency." 

It  is  merely  a transform  space,  the  halfway  stage  in  convolu- 
tion. 

During  various  stages  of  the  computation  of  an  NTT, 
each  accumulation  of  signal  "overflows"  many  times. 

But  still  the  end  result  of  the  convolution  will  be 
exact  if  the  input  signals  are  properly  bounded  [17] . That 
is,  a dynamic  range  constraint  is  imposed  by  the  modular 
arithmetic.  One  must  be  able  to  bound  a priori  the  result 
of  convolution  in  order  to  determine  the  true  answer  from 
the  answer  modulo  Ffc. 

The  worst  case  bound  is  determined  by  the  following 
procedure.  If 


where 


means  circular  convolution  of  length  N, 


and 


(4.21) 


then 


N 2 


b +b, 
a b 


(4.22) 


1 
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Example : 

The  circular  convolution  of  two  sequences  requiring  8 
bits  plus  sign,  will  require  at  most,  22  bits  plus  sign  to 
represent  the  output  sequence.  Since 

N = 64  = 26  I c I < 26  • 28+8 

i n 1 — 


ba  * bb  " 8 lc„l  ± 2"  ’ 216 


|c  I <2 

1 n 1 — 


22 


Arithmetic  modulo  F can  be  implemented  using  b = 2fc 
bit  representation  of  integers  with  some  provision  for 
representing  2^. 

Section  V deals  with  the  implementation  of  Fermat  Number 
Transforms,  where  arithmetic  is  carried  modulo  F = 2°  + 1, 


Notice  that  the  maximum  length  of  sequences  which  can 
be  cycled  convolved  using  the  FNT  with  a = 2 is  N = 2b 
(N  = 4b  for  a = /2) , and  therefore  the  length  of  sequences 
which  can  be  convoled  is  proportional  to  the  word  length 
in  bits  (b) . 

Thus  for  long  sequences  the  word  length  requirement 
may  be  excessive. 

Rader  [3]  suggested  using  a two  dimensional  convolution 
scheme  to  convolve  long  one  dimensional  sequences  and 
Agarwal  and  Burrus  [17,  [18]  presented  such  a two  dimensional 
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convolution  scheme.  Using  this  scheme,  cyclic  convolution 
of  length  N = LP  is  implemented  as  a two  dimensional  cyclic 
convolution  of  length  2LXP . 

This  two  dimensional  cyclic  convolution  can  be  imple- 
mented using  a two-dimensional  FNT . Then  the  word  length 
required  is  proportional  to  the  square  root  of  the  length 

of  the  sequences  to  be  convolved  [13] , which  would  give 

2 

for  a maximum  sequence  length  8b  rather  than  4b  (for  a = /2)  , 
ie.,  for  a computer's  word  length  b = 64,  the  maximum  length 
for  the  transform  will  be 

N = 8b2  = 32  768. 

max 

Appendix  A illustrates  with  an  example  such  a scheme  and 
also  several  other  points:  treatment  of  negative  values  in 
data,  the  structure  of  the  transform  and  the  inverse  matrix, 
negative  powers  of  a,  frequent  "overflow"  during  computa- 
tions, meaninglessness  of  the  transform  values  and  exactness 
of  the  final  answer.  This  example  will  not  demonstrate  the 
efficient  implementation  of  the  FNT  using  the  binary 
arithmetic . 

B.  OTHER  NUMBER  THEORETIC  TRANSFORMS 

Mersenne  and  Fermat  number  transforms  are  very  promising 
for  digital  filter  computation  because  they  can  be  calcu- 
lated without  multiplications.  Their  main  drawback  is  a 
rigid  relationship  between  transform  length  and  wordlength. 


i 
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caused  by  the  fact  that  all  operations  are  performed  in  a 
finite  ring  with  arithmetic  carried  out  an  integer  M. 

Another  difficulty  arises  because  it  is  not  possible  to 
achieve  simultaneously  optimum  efficiency  in  reducing  the 
number  of  operations  and  in  implementing  arithmetic  opera- 
tions. This  is  so  because  Fermat  number  transforms  are 
amenable  to  a fast  transform  algorithm,  and  Mersenne  trans- 
forms are  not,  whereas  arithmetic  operations  can  be  imple- 
mented more  efficiently  modulo  a Mersenne  number  than 
modulo  a Fermat  number  [3] , [13] . 

In  what  follows,  other  number  theoretic  transforms  will 
be  described  briefly.  Such  transforms  provide  more  choices 
of  word  length  and  transform  length,  thus  enlarging  the 

possibility  of  the  use  of  NTT  in  digital  filtering. 

2 

1 . Transforms  Over  the  Galois  Field  GF(p  ) 

Reed  and  Truong  [20]  generalized  the  ideas  of 
number  theoretic  transforms  to  transforms  over  the  Galois 
Field  GF(p^)  where  p is  a prime  Mersenne  number,  i.e. 

p = 2q  - 1 , p = 2,  3,  7,  13,  19,  31,  61  ... 

(4.23) 

Notice  that  this  is  a particular  case  of  the  definition  (4.2), 

in  the  sense  that  here  one  is  interested  only  in  p,  a prime. 

2 

Also,  the  Galois  field  GF(p  ) is  a particular  case  of  the 
more  general  GF(pn),  where  one  is  interested  in  the  case 
n = 2. 
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2 

Let  GF (p  ) denote  the  Galois  Field  (finite  field)  of 

2 

p elements,  where  p = 2^  - 1,  p and  q primes.  Let  d be 

. . 2 2 
a divisor  of  p - 1 (possibly  d = p - 1) . Also  let  the 

2 

element  r e GF(p  ),  generate  the  cyclic  subgroup  of  d 
elements , 


Gd  = (r , r2  ...  rd-1,l)  (4.24) 

Then  a transform  over  this  subgroup  G^  can  be  defined  by 
the  following  pair  [20] 

d-1 

A = 7 rKn  , for  0 < K < d-1  (4.25a) 

j\  n — — 

n=0 

and 


d-1 

am  = (d)'1  l r-Km,  for  0 < m < d-1  (4.25b) 

K=0 

. 2 2 
where  d divides  p - 1,  am  and  A^  are  elements  of  GF(p  ) 

and  r is  a generator  of  the  element  subgroup  Gd> 

It  can  be  shown  that  the  cyclic  convolution  property 

holds  for  this  transform  [20]. 

Now,  if 

x2  = -1  mod  p (4.26) 
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is  not  solvable,  then  the  nonsolvability  of  (4.26)  is 
equivalent  to  the  statement: 


(-1)  is  a quadratic  nonresidue  mod  p 


(see  Appendix  B,  eq.  B-2)  . 

By  Euler's  criterion  (Appendix  B,  theor.  B.3),  this  is 
further  equivalent  to: 


& 


= (-1)(P_1)/2  = (_1}  t (2q-l)-l]/2 


= ( 


= (-D 


_1} (2q-2)/2 


(4.27) 


where 


(— ) is  the  Legendre  symbol  defined  by  [see  Appendix 

r' 

B,  eq.  (B. 8) ] 


(f)  -< 


+1  if  a is  a quadratic  residue  mod  p 

-1  if  a is  a quadratic  nonresidue  mod  p 


Thus  (-1)  is  a quadratic  nonresidue  mod  £ and  (4.26)  is 
not  solvable  in  GF(p);  the  polynomial 


P(x) 


x2  + 1 


(4.28) 


74 


is  then  irreducible  in  GF(p).  A root  say,  i,  of 

p(x)  = x2  + 1 = 0 (4.29) 

. . 2 
exists  and  can  be  found  in  the  extension  field  GF(p  ) [20]. 

2 

The  Galois  field  GF(p  ) is  composed  of  the  set 

GF  (p2 ) = {a  + ib  | a,  b e GF(p)}  (4.30) 

1 

A 

where  i is  a root  of  (4.29),  satisfying 

i2  = -1  (4.31) 

where  -1  = (p-1)  mod  p. 

2 ~ 2 
If  x + 1 = 0 is  not  solvable  in  GF(p),  l e GF(p  ) 

plays  a similar  role  over  the  finite  field  GF(p)  that 

/-I  = i plays  over  the  field  of  rational  numbers. 

A A 

For  example,  suppose  (a  + ib)  and  (c  + id)  are  elements 
of  GF (p2 ) , then  by  (4.31) 

(i)  (a  + ib)  ± (c  + id)  = (a  ± c)  + i (b  ± d)  (4.32) 

(ii)  (a  + ib)  (c+  id)  = ac  + i2bd  + ibc  + iad 

(4.33) 

/s 

= (ac  - bd)  + i (be  + ad) 
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These  are  the  analogues  of  what  one  might  expect  if  (a  + ib) 

A 

and  (c  + id)  were  complex  numbers.  In  applications  to 
radar  and  communication  systems  one  generally  wants  to 
take  convolutions  of  complex  numbers. 

If  (-1)  is  a quadratic  nonresidue  mod  p,  then  convolu- 
tion (4.20)  of  the  complex  integers  can  be  performed  with 

2 

transforms  of  type  (4.25)  on  a Galois  Field  GF(p  ) where 

2 

an  and  b^  are  restricted  to  GF(p  ).  In  other  words,  if 
2 

a^,  bn  e GF (p  ),  for  n = 0,1,2,  ...  d-1  the  transforms  are 


A 


K 


d-1 


n=0 


Kn 


n 


d-1 

BR  = l bn  rKn  , for  K = 0,1,  ... 
n=0 

where  r is  a generator  of  a d-element  subgroup  G^. 

Then  taking  the  inverse  transform  of 


d-1. 


C„  = A • B„  , for  K = 0,1,  ...  d-1  (4.34) 

i\  K K 


one  obtains 


d-1 

cn  = (d)’1  l CR  r-Kn  (4.35) 

K=0 

2 2 

where  cfi  e GF(p  ) and  d divides  p - 1. 
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If  p is  a Mersenne  prime,  the  order  t of  the  subgroup 


with  generator  a,  factor  as  follows  [20] 


t = (2q-l)2  -1=2 


2q  _ 


2*2'4  + 1 - 1 


= 2 


2q  _ 2q+i  + 


l-l 


(4.36) 


q+l / ?2q-q-l  _ 


= 2^  (2 


1) 


q+l.-q-l  _ 


= 2^(2 


1) 


Since  t has  the  factor  d = 2q+\  the  usual  FFT  algorithm 

q+l 

can  be  used  to  calculate  transforms  of  as  many  d = 2 
points.  If 


d = 2K,  1 < K < q + 1 


(4.37) 


and  a is  a primitive  element  of  GF(p  ),  then  the  generator 
of  is 


r = a 


(4.38) 


In  (4.25)  if  one  wants  to  take  the  transform  over  GF (p  ) , 
of  2q+^  point  sequences  of  complex  integers,  aR  c GF(p2) 

then  one  needs  to  find  a primitive  element  in  G ^ of 

o ^ 

GF (p  ).  To  achieve  this,  the  following  theorems  are  useful, 

For  proofs,  see  [20]. 
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Theorem  Is 


If  p is  a Mersenne  prime  and  d = 2 , where 

A 

1 2 K <_  q+1,  then  r = a + ib  is  a primitive 

2 

element  m G,  of  GF(p  ),  if  and  only  if 


d/2 

r = -1  mod  p 


(4.39) 


Theorem  2 : 

For  Mersenne  primes  p > 3,  the  first 
quadratic  nonresidue  modulo  p in  the 
sequence  1,  2,  3,  p-1,  is  2* 

To  find  a primitive  element  in  G g+1  of  GF(p  ),  one  can  use 
the  following  procedure. 

Assume  an  element  r = a + ib  is  of  order  2q+^  in  GF(p2). 

Now 


(a  + ib) 


2q  = (a  + ib)  (a  + lb)  2(3-1 


(4.40) 


and,  it  can  be  proved  [20]  that 


(a  + ib) 


2q-l 


?2q-l 

= (a  + i b)  mod  p 


(4.41) 


Since 


£2q-l 


I i2<5-2 


l (?2} (2q-2)/2 


i (-1) <2q-2)/2  . _i 


(4.42) 
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Recall  that  q = 2,  3,  5,  7,  13,  17,  19,  31,  61,  ...  i 
prime.  So  that  (4.40)  becomes 

* 2^  ~ * 

(a  + ib)  = (a  + ib)  (a  - ib)  mod  p 

2 2 

= a + b mod  p 

By  Theorem  1,  it  follows  that 

~ 2q  2 2 

(a+ib)  = a^+b  = -1  mod  p 

since 

- 9q+l  - 2q+1  _q 

d = 2n  and  — 2 — = 2 . 


Let 

X = a2  mod  2q  - 1 
Y = -b2  mod  2q  - 1 

then  (4.44)  becomes 

X + 1 = Y mod  p 

By  definition  in  (4.46),  X is  a quadratic  residue. 
Y,  [see  Appendix  B,  eq.  B.ll,  and  B.9] 


4.43) 

1 

4.44) 


(4.45) 

(4.46) 

For 
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Y = X + 1 is  a quadratic  nonresidue. 

Hence,  one  way  to  choose  the  numbers  X and  Y is  to  let  X 
and  Y be  two  consecutive  numbers  from  the  set  of  integers 
1,2,  ...  2q-2,  such  that  the  first  number  X is  a square  and 
the  second  is  a nonsquare.  By  Theorem  2,  for  p > 3,  Y = 3 
is  a nonsquare  and  the  preceding  element  X = 2 is  a square. 
Thus  it  is  sufficient  to  let 

a2  = X = 2 mod  2q  - 1 (4.47a) 

b2  5 -X-  1 = -Y  = -3  mod  2q  - 1 (4.47b) 

To  find  the  solution  of  congruence  (4.47a)  one  uses  the 
following  procedure  [20]  . 

First,  notice  that  [20] 

(~=T ) = 1 (4.48) 

2P  - 1 
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Then  by  Euler's  criterion  [Appendix  B] 


2q  - 1 


) = 2((2q  1)_1)/2  = l mod  ( 2q-l)  (4.49) 


Multiplying  both  sides  of  the  congruence  by  2,  then 


( (2^-2)/2 ) +1  . 22<3-l  s , nQd  (2q.1( 


Hence 


2q~2  2 q 

(2  ) = 2 mod  (2q-l) 


Then  the  solution  for  (4.47a)  is 


2q-2 

± 2 mod  (2q-l) 


(4.50) 


(4.51) 


Using  the  same  procedure  for  (4.47b),  b is  given  by  [20] 


b = ± (-3) 


„q-2  , _q-i 

2 mod  2 


(4.52) 


In  of  GF (p  ) there  always  exists  a primitive  element, 

A 

r = a + ib  [20].  By  Theorem  1 (4.39) 


(a  + ib)1^2  = -1  mod  p 


(4.53) 


Raising  both  sides  of  (4.53)  to  the  jth  power,  (4.53)  becomes 
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(4.54) 


( (a  + ib)d//2)-^  = (-1)^  mod  p 

A i ; « • 

By  Theorem  1,  ( (a  + ib)  ) -1  is  a primitive  element  only 

when  j is  an  odd  number.  The  elements  are  distinct  and 

include  all  primitive  elements  of  G . . 

d 

. 2 
In  other  words,  in  the  cyclic  subgroup  of  of  GF (p  ), 

~ ^ i 

if  (a  + ib)  is  a primitive  element  then  (a  + ib)J  is  also 

a primitive  element  for  j = 1,3,5,  ...,  d-1. 

Assume  a is  of  order  2q+^  in  GF(p2).  If  d divides 
2+1 

2q+1  then  r = a2  //d  is  of  order  d in  GF(p2). 

9+1 

For  the  transform  in  (4.25),  with  d a factor  of  2 , 

2q+1/d 

one  can  take  r = a ' as  the  primitive  element  and  d 
as  the  transform  length. 

Example:  Find  all  primitive  elements  expressed  as  a sum 

2 

of  powers  of  two  in  the  subgroup  G^  of  GF(p  ), 
where  q=5,  p=2q-l  = 31  and  d = 23. 

To  do  this,  first  find  an  element  with  order 
2q+1  = 25+1  = 64  in  GF(312). 

A 

According  to  Theorem  1,  if  r = (a  + ib)  is  a primitive 

2 

element  in  G , of  GF(31  ),  then 
26 

: 25 

(a  + ib)  = -1  mod  31 

By  (4.44),  that  becomes 

a2  + b2  = -1  mod  31 
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By  Theorem  2,  3 is  a nonsquare  and  2 is  a square.  By  (4.51) 


a 


2 


,5-2 


8 mod  3 1 


By  (4.52) 

5-2  3 

b = (-3) 2 = (-3) 2 = (-3 ) 8 = (28 ) 8 E 20  mod  31 

Thus  64  is  the  smallest  positive  integer  such  that 

~ 64 

(8  + i 20)  E 1 mod  31 
An  element  with  order  2^+^/d  = 64/8  = 8 is 

(8  + i 20 ) 8 = 88  + (8)  8 7 (i20 ) + (8)  86  (i20)2 

+ (8)  8 5 (i20)  3 + (8)  8 4 ( i20 ) 4 

+ (8)  8 3 (i20) 5 + (8)  82  (i20)6 

+ (8)  8 (±20) 7 + (i20)  8 

g 

where  (R)  are  the  binomial  coefficients.  Expanding  and 
solving. 


7 


(8  + i 20) 


16  + lOi  - 10  - 19i  + 9 + 18i  - 7 - 5i  + 


= (27  + 14)  mod  31 


and  since,  if  (a  + ib)  is  a primitive  element  in  then 

A • 

(a  + ib) -1  is  also  a primitive  element  for  j - 1,3,5,  ...  d-1 
One  has 


(27  + i4 ) 1 mod  31 

A ^ A 

(27  + i4)  mod  31  = (4  + 14)  mod  31 

(27  + i4) 5 mod  31  = (4  + i27)  mod  31 

(27  + i4)  7 mod  31  = (27  + i27)  mod  31 

2 

as  the  primitive  elements  in  G0  for  GF(31  ). 

b 

In  order  to  perform  multiplications  by  powers  of  r in 
hardware,  it  might  be  desirable  to  represent  the  q-bit  words 

A 

a and  b,  where  r = a + ib,  as  a minimal  sum  of  powers  of 
two.  Then,  for  example,  r114"*"  mod  p can  be  obtained  by 
multiplying  r11  by  r modulo  p recursively  using  a minimal 
number  of  bit  rotations,  q-bit  word-  complements,  and 
additions  [3] . 

2 

That  is,  the  primitive  elements  in  Gg  of  GF(31  ) can 
be  written: 


19 
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(27  + i4 ) = (25  - 22  - 2°)  + i 22 

(27  + 14 ) 3 mod  31  = 4 + i4  = 22  + i 22 

(27  + i4 ) 5 mod  31  = 4 + i 27  = 22  + i(25  - 22  - 2°) 

(27  + i4)  mod  31  = 27  + i27  = (2  -2 -2 ) + i (2  -2  -2  ) 


Notice  that,  r = 4 + i4  has  the  shortest  number  of  terms  of 

power  2.  This  primitive  element  is  such  that  multiplications 

2 

by  powers  of  r m Gg  of  GF(31  ) yield  the  least  hardware. 

Such  an  r is  called  the  simplest  primitive  element. 

In  order  to  perform  the  convolution  of  two  d-point 
sequences  of  complex  integers  an  and  bR  with  dynamic  range 
A and  B,  respectively,  the  components  of  the  circular 

A 

convolution  c = y + i 6 are  required  to  remain  in  the 

p p p 

interval  [20] . 


_ Eli 

2 


1 V 


(4.55) 


Specifically  to  satisfy  (4.55)  for  all  sequences  an  = an  + ^ 

A 

and  b = x + i y , such  that  la  | , | 6 | < A,  and  |x  | , 

n n n lnllnl—  _ 1 n 1 

|yn|  <_  B,  it  is  necessary  [20],  that: 


dAB  < 

— 4 


(4.52) 


If  A = B,  then  by  (4.52)  the  largest  value  of  A is 
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(4.53) 


where  [x]  denotes  the  greatest  integer  less  than  x.  This 
scaling  constraint  sometimes  forces  one  to  choose  an 
excessive  value  p in  order  to  avoid  overflow.  Such  a large 
p implies  a computer  word  length  that  is  often  undesirably 
long. 

31  8 

Example:  Let  p = 2 - 1 and  let  d = 2 

By  (4.53) 


If  a and  b are  constrained  to  the  interval 
n n 


-210  < a ,B  ,xy  < 210 
— n n n n — 


one  is  guaranteed  to  keep  the  components  of  cp  in  the 
interval 


— (231— l)/2  < x £ (231  - l)/2  . 

According  to  [30]  and  [29] , the  Chinese  Remainder 

theorem  can  be  employed  to  develop  a ring  which  is  the 

2 ... 

direct  sum  of  certain  Galois  fields  GF(p  ).  This  ring  is 
utilized  to  extend  the  dynamic  range  of  complex  number 
convolutions . 
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A disadvantage  of  this  transform  is  that  multipli- 
cation by  powers  of  the  primitive  element  is  not  as  simple 
as  that  developed  by  powers  of  2,  in  Mersenne  transforms 
and  Fermat  number  transforms. 

Recently,  Reed  and  Liu  [36]  developed  a high-radix 

. 2 

FFT  algorithm  for  computing  transforms  over  GF(p  ),  where 
p is  a Mersenne  prime.  This  new  algorithm  requires  substan- 
tially fewer  multiplications  than  the  conventional  FFT. 

2 . Transforms  Over  the  Finite  Field  GF(p) 

Colomb,  Reed  and  Truong  [19]  introduces  a Fourier- 
like  transform  in  GF(p),  where  p is  a prime  of  the  form 
An  = 3 x 2n  + 1.  This  transform  increases  the  variety  of 
methods  and  the  digital  word  lengths  that  can  be  used  for 
computing  the  convolutions  of  integers  beyond  the  previous 
Fermat  or  Mersenne  number  transforms. 

Let  GF(p)  be  the  finite  field  of  residue  classes 
modulo  p,  and  let  the  integer  d divide  p-1.  Also  let  the 
element  y £ GF(p)  generate  the  cyclic  subgroup  of  d elements, 
Gd  of  GF (p) . 

Then  a transform  over  this  subgroup  G^  can  be 
defined  by  the  pair  [19]  : 

d-1 

AR  = l an  Kn  0 £ K < d-1  (4.54a) 

n=0 

d-1 

an  = .(d)"1  l Ak  y Kn  (4.54b) 

K=0 
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where  a , A e GF(p)  for  n = 0,1,2,  ...  d-1  and  (d) 

il  I\ 

is  the  inverse  of  (d)  mod  p.  Also  it  can  be  shown  [19] 
that  the  circular  convolution  of  two  finite  sequences  of 
integers  can  be  obtained  as  the  inverse  transform  of  the 
product  of  the  transforms  defined  by  (4.54). 

Notice  that  if  pn  is  a prime  of  the  form  3 x 2n  + 1 
the  order  t of  GF(pn)  with  generator  a,  is  given  by  (2.14) 

t = Pn  - 1 " 3*2n  <4*55> 


Since  t has  the  factor  d = 2n,  the  usual  FFT 

algorithm  can  be  utilized  to  calculate  the  transform  of  as 
n K 

many  as  d = 2 points.  If  d = 2 , 1 £ K £ n and  a is  the 
generator  of  GF(pn),  then  the  generator  of  Gd  is  (by  2.20) 


Y 


a 


3-2n/2K 


= a 


3-2 


n-K 


(4.56) 


the  basic  operations  used  in  the  transform  defined  by  (4.54) 
are  addition  and  multiplication  mod  3 x 2n  + 1. 

Algorithms  for  searching  a prime  of  the  form  3 x 2n  + 1 
have  been  developed  [19]  , and  will  not  be  discussed  in  this 
thesis . 


As  a final  remark  about  this  transform  one  should  say 

that  the  same  type  of  dynamic  range  constraint  applies  here 

2 

as  in  transforms  over  GF(p  ),  although  in  this  case  GF(p) 
refers  to.  integer  convolutions.  A special  number  theoretic 
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transform  that  can  be  computed  using  a high  radix  FFT  is 
defined  [23]  over  GF(p),  a finite  field  of  integers  modulo 
primes  p of  the  form  (2n-l)2n  + 1.  Such  primes  are  special 
cases  of  numbers  of  the  form  2m  - 2n  + 1 proposed  by  Pollard 
[22]  . 

\ 

If  p is  a prime  of  the  form  p = (2n  - l)2n  + 1 
the  order  t of  a primitive  root  in  the  finite  field  GF(p) 
is  given  by 

t = p - 1 = (2n  - 1 ) 2n  (4.57) 

n $L 

Since  t has  a factor  2 , one  can  choose  d = 2 , where 
1 < l < n as  the  transform  length.  The  arithmetic  in 
GF (p)  with  p = (2n  - l)2n  + 1 is  similar  to  the  arithmetic 
in  GF (p)  where  p is  a prime  of  the  form  3*2n+l.  Addition 
mod  p involves  at  most  a triple  binary  addition  [23] . 

Multiplication  by  a power  of  2 mod  p can  be  imple- 
mented by  using  the  combination  of  table  lookups  and  mod  p 
addition  [23] . However,  multiplications  mod  p still  needs 
a full  binary  multiplication  followed  by  a division  [23] . 
This  seems  a disadvantage  when  compared  with  Fermat  number 
transforms.  The  FFT  over  GF(p)  is  similar  to  the  FFT  over 
the  complex  number  field,  except  that  a root  of  unity  y in 
GF  (p)  replaces  e j^nr/d  an(j  that  integer  arithmetic  modulo 
p is  used. 

It  turns  out  [23],  that  the  number  of  modulo  p 
multiplications  required  for  the  evaluation  of  an  FFT  over 
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GF(p),  p = 2n(2n-l)  + 1,  is  greatly  reduced  when  compared 

to  the  number  of  multiplications  required  to  evaluate  the 

FFT  over  the  complex  number  field. 

It  was  found  [23],  that  the  number  An  = (2n-l)2n+l 

is  prime  only  for  the  cases  m = 1,2,4,32.  Hence  only 

32  32 

transforms  over  GF(p)  when  p = (2  -1)2  +1  have  practical 

application  in  digital  filtering.  The  maximum  transform 
length  in  these  conditions  is  equal  to  4 294  967  2961 

It  can  be  shown  that  high  radix  FFT  can  also  be 
used  to  compute  transforms  over  a finite  ring  modulo  a 
number  of  the  form  (2m-l)2n+l  with  m even  and  m = 1 or 
m = n,  in  a similar  fashion  for  the  GF(p)  case.  The  condi- 
tion for  such  a transform  to  exist  was  shown  in  References 
[22]  and  [20]  . 

3 . Complex  Mersenne  Transforms  and  Complex  Pseudo- 
Mersenne  Transforms 

The  main  limitations  of  the  Mersenne  transform 
approach  are  related  to  the  fact  that  the  number  of  trans- 
form terms  q is  a prime.  This  means  that  calculations  of 
the  transforms  cannot  be  simplified  by  an  FFT-type  algorithm 
and  that  the  number  of  transform  terms  is  equal  to  the 
word  size.  These  limitations  can  be  slightly  alleviated  by 
using  a root  (-2)  instead  of  2 in  (4.4)  and  (4.5),  as  said 
previously  (4.8).  The  maximum  transform  length  then  becomes 
2q.  It  is  also  possible  to  increase  the  maximum  convolution 
size  by  resorting  to  multidimensional  convolutions  [3],  [4], 
[31] . Unfortunately,  this  result  is  achieved  at  the  expense 
of  increased  requirements  for  computation  and  storage. 
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By  defining  Complex  Mersenne  Transforms,  it  is 

possible  to  achieve  higher  computation  efficiency  while 

increasing  both  maximum  transform  length  and  convolution 

length.  Again,  in  a Merseene  ring,  with  p = 2q  - 1,  (2) 

and  (-2)  are  respectively  roots  of  orders  q and  2q, 

corresponding  to  transforms  of  lengths  q and  2q,  respectively. 

ci  d 

Since  j is  a prime,  2 and  -2  are  also  roots  of 
q and  2q,  provided  d is  not  a multiple  of  q (by  2.20). 

Notice  that  ( 2 j ) is  a root  of  order  4q , since 

(2j)4q  = (29)4  (j4)q  = (l)4  (l)q  = 1 mod  2q  - 1 (4.58) 

Also  (1+j)  is  a root  of  order  89,  since 

(i+j)8q  = [(1  +.j)8]q 


and 


, 8 , , 8 » , 8 » ,6  .2  , 8 . i *3  . 8 » i 4 *4 

(1+3)  = 1 + (^  1 3 + (2)  1 3 + (3)  1 3 + (4)  1 3 


,8.  .3  '5,  ,8.  , 2 .6,  ,8.  . .7  .8 

+ (5)  1 3 + (g)  1 3 + (7)  1 3 +3 


= 1 + 8j  - 28  - 56j  + 70  + 56j  - 28’8j  + 1 


= 16  + Oj 
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Then 


(l+j)8q  = (24)q  = ( 2q) 4 = (l)4  mod  2q-l  = 1 mod  2q-l  (4.59) 

; 

The  same  conclusions  can  be  drawn  with  respect  to  (1-j), 
ie.,  (1-j)  is  a root  of  order  8q,  in  the  ring  p = 2q-l, 
for  q = 2,3,5,7,13,17,19,31,61,... 

Higher  order  complex  roots  do  not  have  a simple 
structure  and  therefore  will  generally  not  be  of  practical 
interest. 

i 

Under  these  conditions,  a Complex  Mersenne  Trans- 
form  having  4q  terms  can  be  defined  by  [24], 


4q— 1 

Av  = l a jnK  2nK  mod  (2q-l)  (4.60) 

k n 

n=0 

j = /- 1 , K = 0,1,  ...»  4q-l 

Notice  that  q has  an  inverse  modulo  p,  and  that  the 
inverse  of  4 is  2q  2,  since 

4 = 2 2 ( 4 ) _1  mod  2q-l  = 2q_2  (4.61) 


for 

22 • 2q-2  = 22 • 2q • 2-2  = 2q  = 1 mod  (2q-l) . 
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Then  4q  has  an  inverse  R such  that 


( 4qR)  mod  2q  - 1 = 1 mod  2q  - 1 

and  the  inverse  transform  of  A,  is 

k 

4q-l 

am  = R l ak  j~mK  2-mK  mod  2q  - 1 (4.62) 

K=0 

m=  0,  1,  . . . , 4q-l 

where  all  exponents  and  indices  are  taken,  modulo  4q, 
in  both  (4.60)  and  (4.62). 

Using  a root  (j+1)  leads  to  a definition  of  a 
Complex  Mersenne  Transform  having  8q  terms  with 

8q~1  a (l+j)nK  mod  2q-l  (4.63) 

*K  * l 
n=0 

K = 0,1,  . . . , 8q-l 

and,  with  R such  that  (8qR)  mod  2q-l  = 1 , an  inverse 

transform 


8q-l 

a=  R J A (l+j)_mK  mod  2q-l  (4.64) 

IU  x\ 

K=0 

m = 0 , 1 , . . . , 8q-l . 

with  all  exponents  and  indices  taken  modulo  8q. 
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; 

! 

* 


Computation  of  a complex  convolution  by  means  of 
Complex  Mersenne  Transforms  is  carried  out  as  with  the 
DFT , with  real  and  complex  parts  being  evaluated  modulo  p 
separately.  In  order  to  avoid  errors  due  to  overflow,  the 
amplitudes  of  real  and  imaginary  outputs  must  be  bounded 
to  (p-l)/2.  This  means  that  usually  the  word  length  of 
real  and  imaginary  parts  of  input  sequencds  (an)  and  (xnl 
is  less  than  half  that  of  output  sequences.  In  other  words, 


all  computations  are  carried  out  modulo  p on  q-bit  words, 
yielding  q-bit  word  outputs,  and  the  input  sequences  are 
represented  by  words  of  length  less  that  (q-l)/2  bits. 
Notice  that  the  calculation  of  these  transforms  can  be 
partly  simplified  by  an  FFT-type  algorithm  because  the 
number  of  terms  is  no  longer  a prime. 

It  has  been  shown  [24]  that,  in  the  practical  range 
of  interest  for  q (where  q = 31) , substituting  Complex 
Mersenne  Transforms  for  conventional  Mersenne  Transforms 
results  approximately  in  an  eightfold  reduction  in  the 
number  of  operations. 

If  the  two  sequences  {yn>  and  {an}  to  be  convolved 
are  real,  the  full  benefit  of  using  Complex  Mersenne  Trans- 


forms can  be  retained  by  processing  simultaneously  two 
successive  blocks  of  sequence  {yn>  by  means  of  the  same 
Complex  Mersenne  Transforms.  This  is  done  by  computing  the 


complex  convolution  { Zm } , of  the  sequence  {anJ  with  the 

auxiliary  complex  sequence  {x„  = y„  + j y„,D~}*  The  real 

n n n-rocj 
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part  {um>  and  the  imaginary  part  {um)  of  {Zm>  yield 
respectively  the  convolution  of  {yn>  and  the  next  block 

{yn+8q}  With  {an}* 

Up  to  now,  the  discussion  of  this  section  has  been 
restricted  to  Mersenne  numbers,  that  is,  to  numbers 
p = 2q  - 1 such  that  p is  a prime.  If  p is  not  a prime, 
its  prime  factorization  is  given  by 


l 


(4.65) 


Recall  that  an  m-point  real  transform  having  the  circular 
convolution  property  can  be  defined  in  the  ring  of  integers 


i 

i 

: 


■» 

i 


modulo  p,  provided  m-point  transforms  can  be  defined 
separately  in  the  fields  p^,  p2,  ...,  p^. 

This  follows  directly  from  the  Chinese  remainder 
theorem  (2.27),  and  leads  to  the  conditions  for  the  exis- 
tence of  an  m-point  transform  in  the  ring  p that  m must 
simultaneously  divide  p^-1,  P2_l/  ’**'  when  P is 

a prime , the  maximum  length  of  the  transform  is  M = p-1. 

Transforms  in  a ring  p,  with  p nonprime,  are  there 
fore  proportionally  shorter  than  transforms  defined  modu  . 


a prime  number.  If  p and  q are  composites  with  q = q^ 
ql 

and  q^  prime,  2 -1  divides  p and  the  maximum  trars: 

ql 

length  is  governed  by  that  possible  for  2 -1. 

This  led  Rader  [3]  to  consider  that  the  c : y t : a 
of  interest  in  a ring  modulo  2q-l  were  Merser  : 
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The  situation  changes  noticeably  if  one  considers 
Comples  Mersenne  Transforms.  Nussbaumer  [24]  shows  that 
given  an  m-point  real  transform  of  root  2 with  p composite 
and  q,  ,m  odd  integers,  one  can  define  8m-point  complex 
transforms  in  the  ring  modulo  p with 


AK 


8m- 1 


l 

n=0 


a 


n 


(1+j) 


amK 

9 


j = /=T 

K = 0,1, 


(4.66a) 

8m- 1 


8m- 1 

am  = (8m)-1  l (l+j)-ajnK,  m = 0,1,  ...,  8m- 1 

K=0 

Notice  that,  the  existence  of  an  m-point  real  transform 
in  the  ring  modulo  p implies  that  m has  an  inverse, 

R modulo  p.  Also,  the  inverse  of  8 modulo  p,  is 

8_1  = 2q_ 3 since  8 x 8-1  = 23-2q-3  = 23-2q*2-3  = 2q  = 1 


thus 


(8m)  1 ~ 2q  3 • R mod  p exists. 

Table  I shows  the  various  possibilities  for  p 
nonprime  and  odd  q. 

The  case  of  q prime  (q  = 23,29,37,41,43,47)  corres- 
ponds to  conventional  Mersenne  Transforms.  When  q is  not  a 


mod  p 
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-2  ROOTS  FOR  REAL 


127-4432676798593  (2  • 3 • 7 (2 ' • 3 • 7 • 43-  337 • 5419.) 


prime,  the  corresponding  transforms  are  called  pseudo- 

Mersenne  transforms.  They  have  a very  short  length  and 

their  roots  are  not  powers  of  2.  Notice  that,  in  order  to 

achieve  maximum  effectiveness  in  computing  convolutions  by 

means  of  pseudo-Mersenne  Transforms,  it  would  be  desirable 

to  have  relatively  long  transforms  with  a number  of  terms 

highly  factorizable.  This  does  not  seem  possible  with 

transforms  modulo  p = 2q-l.  One  notes,  however  that  when 

p is  not  a prime,  with  the  prime  factorization  of  p defined 

di 

by  (4.65),  one  can  define  transforms  modulo  p/p^  having 
power  of  2 roots  and  such  that  the  number  of  terms  is 
large  and  highly  factorizable. 

These  transforms  can  be  defined  by 


8m- 1 


d. 


( I d+j)  nK)  mod  p/pi  1 

n=0 


(4.67) 


K * 0,1  ...  8m-l 

j = /-T 


Various  possibilities  of  such  transforms  are  listed  in 
Table  II. 

It  can  be  seen  that  the  maximum  number  of  terms  is 
both  large  (40  to  392  terms)  and  highly  factorizable,  thereby 
leading  to  efficient  FFT  type  computation  with  a minimum 
number  of  operations.  It  would  seem,  however,  that  these 
advantages  are  offset  by  the  fact  that  the  various  operations 
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AM)  ROOTS  FOR  REAL  AND  OQMPI£X  TRANSFORMS  IN  HIE  RING  (2q-l)/p. 


are  performed  modulo  (2q-l)/pi  . The  corresponding  arith 

metic  circuits  are  much  more  complex  than  arithmetic 
circuits  modulo  2q-l  [24]. 

This  difficulty  can  be  circumvented  by  noticing 

that  as 


one  can  compute  the  convolution  modulo  p = 2q-l  as  with 

conventional  Mersenne  Transforms  and  obtain  the  final 

di 

result  by  performing  a last  operation  modulo  p/p^  on 
the  convolutions  computed  modulo  p,  i.e., 

d d. 

Zm  mod  p/p^^  = (Zm  mod  p)  mod  p/p.^  (4.68) 

By  proceeding  in  this  fashion  relatively  long 
convolutions  can  be  computed  efficiently  by  means  of  FFT- 
type  algorithms  with  all  but  the  last  operation  performed 
with  implemented  arithmetic  circuits  operating  modulo  (2q-l) 
[24]  . 

Taking  as  an  example  the  case  of  transforms  defined 

by  q = 25,  one  can  see  from  Table  I that  the  maximum  odd 

25 

length  for  real  transforms  computed  modulo  p - 2 -1  is 

15  terms  and  that  the  corresponding  roots  are  not  powers 

25 

of  two.  By  operating  modulo  (2  • -1)/31,  it  is  possible 
to  define  real  transforms  having  power-of-two  roots  with  a 
maximum  odd  length  increased  to  25  terms.  The  maximum 
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length  is  then  expanded  to  200  terms  by  using  complex 
roots.  Such  a transform  can  be  computed  very  efficiently 
by  means  of  an  FFT  type  algorithm  with  a three-stage  radix 
2 decomposition,  followed  by  a two-stage  radix  5 decomposi- 
tion. 

One  limitation  of  conventional  Mersenne  Transforms 
is  the  rigid  relationship  between  word  length  and  trans- 
form length.  In  this  respect,  pseudo-Mersenne  Transforms 
provide  a significant  improvement  because  their  maximum 

number  of  terms  M is  highly  composite  and  any  transform 

max 

length  submultiple  of  M can  be  selected. 

max 

It  is  even  possible  to  have  several  transforms  of 
identical  length  and  defined  modulo  integers  p^,  ...  pi 
that  are  relatively  prime.  The  convolution  car  then  be 
computed  separately  modulo  p^,  ...  p^  and  then  the  final 
result  obtained  modulo  (p1*P2*‘,Pi)  by  the  Chinese  remainder 
theorem.  This  approach  could,  for  instance,  be  used  to 
compute  a 40-term  convolution  with  an  approximate  word 
length  of  32  bits  by  means  of  transforms  defined  by 
modulo  (215-l)/7  and  (225-l)/31. 

4 . Pseudo  Fermat  Number  Transforms  and  Complex 

Pseudo  Fermat  Number  Transforms 

This  section  considers  a generalization  of  Fermat 
Number  Transforms,  such  that  transforms  having  roots  which 
are  powers  of  2 are  defined  in  field  or  ring  which  is  a 
factor  of  an  integer,  p = 2q+l.  With  such  pseudo  Fermat 
Number  transforms  it  is  possible  to  have  much  more  flexibility 
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in  selecting  desired  word  lengths  than  with  conventional 
Fermat  Number  Transforms.  In  some  cases  it  is  possible 
to  define  complex  pseudo  FNT's  which  are  well  adapted  for 
filtering  complex  signals  and  allow  increased  transform 
and  convolution  lengths  when  compared  to  conventional 
FNT 1 s . 

Number  theoretic  transforms  in  a ring  submultiple 
of  p = 2^+1  are  called  pseudo  Fermat  Number  Transforms 
[37]. 

If  one  restricts  to  roots  2 , one  can  define  an 
M-term  pseudo  FNT  and  its  inverse  by 

M-l  d 

^ * ( l an  2wnK)  mod  p/pi  1 (4.69a) 

n=0 

M-i  d 

am  = (R  £ Ar  2~wmK)  mod  p/Pi  1 (4.69b) 

K=0 

m = 0,1  ...  M-l 

di 

with  M*R  = 1 mod  p.p^ 

It  can  be  seen  that  these  transforms  have  the  same 
structure  as  pseudo  Mersenne  transforms  but  are  defined  in 
a ring  submultiple  of  2q+l  instead  of  a ring  submultiple 
of  2^-1  for  pseudo  Mersenne  transforms. 

The  choice  of  the  particular  ring  on  which  pseudo 
FNT's  are  defined  is  very  important.  Usually,  p will  be 


H 
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divided  by  its  smallest  factors , with  the  remaining  factors 
large  enough  to  allow  defining  long  transforms  with  powers 
of  two  roots.  Pseudo  FNT's  can  be  defined  to  q even  and  q 
odd  [37] . Various  possibilities  for  such  transforms  with  q 
even  are  listed  in  Table  III. 

It  can  be  seen  that  there  is  much  more  flexibility 
in  word  length  and  transform  length  selection  than  with 
transforms  defined  modulo  2q+l. 

Here  also  as  in  the  case  of  pseudo  Mersenne  trans- 

q d . 

forms,  performing  the  various  operations  modulo  2 +l/p^ 
would  seem  rather  awkward,  as  the  corresponding  arithmetic 
circuits  are  much  more  complex  than  those  operating 
mod  (2q+l) . Again  the  solution  is  to  compute  the  convolu- 
tion modulo  p = 2q+l  as  with  conventional  FNT's  and  obtain 

d 

the  final  result  by  performing  a last  operation  modulo  p/p^ 
on  the  convolution  evaluated  modulo  p: 

d.  d. 

Zm  modulo  p/p.^  1 = (Zm  modulo  p)  modulo  p/p^  1 

(4.70) 

Assuming  the  factorization  of  the  number  of  transform  terms 
is  given  by 


M = Mx  • M2  • • • Mj  (4.71) 

the  pseudo  FNT  can  be  computed  by  a mixed-radix  FFT-type 
algorithm. 
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5 -277 -1013. 1657 -30269  (2  +l)/5  92  13  920  773  304  712  2 

(22.23) 


If  q is  odd,  it  is  possible  to  increase  the  maximum 
transform  length  [37]  by  using  complex  pseudo  FNT's.  The 
existence  of  complex  pseudo  FNT's  can  be  demonstrated  by 

considering  an  M term  pseudo  FNT  defined  in  the  ring 

^i  3 ^i  W 

p/p^  = (2'+l)/p^  with  a root  2 of  order  M. 

Notice  that  if  W and  q are  odd,  the  condition 

M • W = 2q  (4.72) 

implies  that  M is  even,  and  M/2  is  odd.  Under  these 

w 

conditions,  (-2)  is  a root  of  order'  M/2  since 

[(-2)W]M/2  = (-2)WM/2  = (-2)2q/2  = (-2) q 

= (-2q+2q+l)  = 1 mod  2q+l  (4.73) 

Wd 

and  (-2)  is  also  a root  of  order  M/2,  provided  d and  q 

W . 

have  no  common  factors  [37].  This  suggests  that  (2j)  is 
a root  of  order  2M,  since 

(2j)W2M  = (2 j ) 4q  = (2q) 4 ( j 4 ) q = (-1) 4 (l)q  = 1 mod  2q+l 

(4.74) 

w 

and  (1+j)  is  a root  of  order  4M  since 

(l+j)W4M  = (l+j)8q  = (24 ) q = (2q) 4 = (-l)q  = 1 mod  2q+l 

(4.75) 
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thus  a 2M  term  pseudo  FNi  can  be  defined  as  [37] 


2M-1 

( l a 

n=0 


n ( 2 j ) WnK)  mod  p/pi 


j = /-I 
K = 0,1 


2M-1 

(4.76a) 


d. 

As  M has  an  inverse  in  the  ring  p/p.  1 and  2 is  relatively 

1 d. 

prime  with  p,  2M  has  an  inverse  R in  the  ring  p/p^  1 and 
an  inverse  transform  can  be  defined  by  [37] 

2M-i 

am  = (R  l Ar  ( 2 j ) -WmK)  mod  p.Pi  1 (6.76b) 

K=0 


m = 0 , 1,  ...  2M-1 


with  all  exponents  and  indices  taken  modulo  2M. 

It  can  be  demonstrated  that  the  transform  satis- 
fies the  convolution  theorem  [37],  and  that  two  complex 

sequences  of  length  2M  can  be  cyclically  convolved  via 

d. 

complex  pseudo  FNT's  modulo  p/p^  1. 

In  such  an  approach,  all  arithmetic  operations  are 

2 

performed  as  in  normal  complex  arithmetic  with  j = -1, 

and  real  and  imaginary  parts  treated  separately  modulo  p. 

The  final  convolution  product  is  obtained  by  performing  a 

di 

last  operation  modulo  p/p^^  . 

A 4M-points  complex  transform  can  be  defined  by 
using  a root  (1+j)  instead  of  (2j)  [37] 
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(4.77) 


4M-1  d 

Ar  = ( I an  (l+j)WnK)  mod  p/pi  1 

n=0 

K = 0,1  ...  4M-1 

Higher  order  complex  roots  have  a complex  structure  and 
therefore  the  maximum  length  of  complex  pseudo  FNT's  which 
can  be  computed  without  multiplications  is  4M  in  the  general 
case,  and  8q  when  W = 1 (M  = . 

It  can  be  seen  that  using  complex  pseudo  FNT's 
allows  for  a given  computation  complexity,  operation  over 
transforms  and  convolution  lengths  twice  as  long  as  with 
real  Fermat  and  pseudo  FNT's. 

In  particular,  when  W = 1,  the  maximum  length  of 
an  FNT  is  4q  for  a root  /2,  while  the  maximum  length  of  a 
complex  pseudo  FNT  is  8q  for  a (1+j)  root. 

Various  possibilities,  q odd,  for  complex  pseudo 
FNT’s  are  listed  in  Table  IV.  It  can  be  seen  that  for  q 
prime,  we  have  complex  transforms  of  length  8q  with  root 
(1+j).  These  transforms  have  a number  of  terms  which  are 
not  highly  f actorizable . 

A more  interesting  case  seems  to  correspond  to 
those  nonprime  values  of  q,  for  which  it  is  possible  to 
define  transforms  with  a number  of  terms  which  are  both 
large  and  highly  factorizable  and  therefore  lead  to  efficient 
computation  via  an  FFT  type  algorithm. 
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FOR  COMPLEX  PSEUDO  POTS  IN  THE  RING  (2q+l)/p.  1 WITH  q ODD 


3-43-4  363  953  127  297  (2’  +l)/3-43  392  4 352  820  593  809  (j+1) 

(23-72) 


In  these  respect,  the  200-points  and  392-points 

25 

transforms  defined  respectively  modulo  (2  +1)/3.11  and 

49 

(2  +l)/3.43  are  particularly  attractive. 

It  is  sometimes  desirable  to  compute  convolutions 
with  improved  dynamic  range.  In  this  case  the  same  convo- 
lution can  be  computed  modulo  several  relatively  prime 
integers  p.^,  p2,  ...  p^  and  the  final  result  obtained 
modulo  (p1*p2**.pi)  via  the  Chinese  remainder  theorem. 

For  this  application,  the  availability  of  complex  pseudo 
Mersenne  transforms  and  pseudo  FNT's  having  the  same  length 
and  defined  modulo  relatively  prime  integers  is  particularly 
interesting.  A 200-point  convolution  could  for  instance 

be  computed  with  a dynamic  range  of  about  40  bits  via  complex 

25 

pseudo  Mersenne  transforms  defined  modulo  (2  -1)/31  and 

25 

via  complex  pseudo  Fermat  transforms  modulo  (2  +D/3.11. 
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V.  IMPLEMENTATION  OF  FERMAT  NUMBER  TRANSFORMS 


The  best  known  number  theoretic  transform  is  the  Fermat 
Number  Transform  (FNT) . FNT  are  potentially  attractive 
for  digital  filtering  applications  because  they  have  the 
convolution  property  (3.5)  and  can  be  computed  without 
multiplications . 

In  principle,  such  Number  Theoretic  Transforms  (NTT's) 
could  be  implemented  in  the  same  way  as  Discrete  Fourier 
Transforms  but  with  multiplications  by  trigonometric  func- 
tions replaced  by  multiplications  by  powers  of  two,  all 
operations  being  performed  modulo  a Fermat  number. 

In  practice,  however,  direct  transposition  of  Fast 
Fourier  Transform  (FFT)  architectures  does  not  necessarily 
lead  to  optimum  implementations  and  the  development  of 
special  configurations  to  computing  NTT's  seems  worth 
exploring.  Along  these  lines,  the  special  attributes  of 
the  FNT,  led  several  researchers  to  consider  various  coding 
techniques  for  simplifying  the  implementation  of  the  trans- 
form and  special  purpose  implementations  of  the  FNT. 

This  section  will  discuss  these  concepts.  In  part  A 
various  coding  techniques  are  presented  and  in  part  B the 
realization  of  the  FNT  is  considered. 

A.  BINARY  ARITHMETIC  FOR  THE  FERMAT  NUMBER  TRANSFORMS 

In  computing  the  FNT,  arithmetic  is  done  modulo  Ffc  = 2b, 
b = 2t  (4.11).  In  this  arithmetic  the  only  allowed  integers 
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are  {0,1  ...  2b)  and  all  integers  whose  absolute  value  do 
not  exceed 


Ft_1  _ (2b+l) -1  _ 2b  _ „b-l 

2 “ 2 T ' 2 


(5.1) 


can  be  represented  unambiguously . 

Negative  numbers  are  represented  by  adding  2b+l  to 
them;  this  is  similar  to  twos  complement  and  ones  comple- 
ment representation  of  negative  integers. 

Notice  that  in  a b-bit  register,  all  integers  from 
0 to  2-1  can  be  represented. 

Example;  Let  F = 2b+l  = 24+l  = 17  b = 2t  = 22  = 4 

(i)  allowed  integers  {0,1,2,  ...,  16  = 24} 

(ii)  absolute  values  of  number  that  can  be  represented 
unambiguously  do  not  exceed 

= iZf-i  = 8 = 2b-1  = 24'1  = 8 

(iii)  negative  numbers; 

-5  = (-5  + 17)  « 12  mod  17 

(iv)  a b = 4 bit  register  allows  as  maximum 

11112  = 8 + 4 + 2 + 1 = 151q  » 2b-l  = 24-l  - 151q 
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Thus,  using  a b-bit  register,  the  problem  remains  to 

b b 4 

represent  the  quantity  2 (in  the  example  above,  2°  = 2*  = 16) . 

If  data  is  uncorrelated,  the  probability  that  this  number 
will  appear  after  an  arithmetic  operation  is  approximately 
2"b  [17]. 

For  digital  filter  applications,  b would  typically  be 
32  or  64;  in  these  cases  the  occurrence  of  2b  is  extremely 
small. 

In  their  software  realization  of  the  FNT,  Agarwal  and 
Burrus  [4]  define  a binary  arithmetic  modulo  F = 2b+l, 
b = 2fc.  The  representation  of  such  a modulus  requires 
(b+1)  bits,  for  the  representation  of  the  quantity  2b  = 1 
mod  Ffc.  In  order  to  simplify  modular  arithmetic,  Agarwal 
limits  his  realization  to  a b-bit  arithmetic. 

This  involves  some  input  quantization  error  where  (-1) 
occurs  as  an  input  sample,  as  well  as  the  extremely  small, 
but  realistic,  probability  of  a complete  data  block  in 
error  when  a (-1)  occurs  as  the  result  of  an  FNT  computation. 

The  following  discussion  is  based  on  the  b-bit  represen- 
tation of  integers.  The  various  basic  arithmetical  opera- 
tions can  be  implemented  modulo  Ffc  [4] . 

(i)  Negation 

b-i  L 

Let  A = I a . 2 , a.  = 0 or  1 (5.2) 

i=0  1 1 

Then  by  [ 4 ] 
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b-1  b-1 

-A  - l ai  21  = l 57  21  - (2b-l)  (5.3) 

i=0  i=0 

Example:  Let  A = 8 = (1000)2  = 8 mod  17  (=  24+l) . 

Then 

-A  = (0111)  - (24-l)  = 7 - (15)  = -8 

Notice  that  (5.3)  can  be  arranged  in  the  following  form: 

b-1 

-A  = l aT  21  - (2b-l) 
i-0 

b-1 

* l a~  21  - (2b-l)  + (2b+l)  mod  F 
i«0 

b-1 

, / 

= l JT  21  + 2 (5.4) 

i=0 


Thus  to  negate  a number,  one  has  to  complement  each  bit 
and  add  2 to  it. 

Example:  Let  F = 17 

0111 

A =»  8 = 1000  -A  = -8  = { +10 } " 9 mod  17 

1001 
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Example: 


17 


Let  F. 


A = 8 = 1000;  -8 


24  + 1 


0111 


1001 


= 9 mod  17 


(ii)  Addition 

When  one  adds  two  b-bit  integers,  one  obtains  a 
b-bit  integer  and  possibly  a carry  bit.  The  carry  bit 
represents  2b  = -1  mod  Ffc. 

To  implement  addition  in  arithmetic  modulo  2b+l, 
one  simply  subtracts  the  carry  bit.  Thus  the  hardware 
should  be  of  the  carry  subtract  type. 


Example : 


Let  F = 17 


1111 

15  + 10  - 25  = 8 mod  17  =<  + 1010  >=  8 mod  17 


1000 


(iii)  Subtraction 

Subtraction  is  implemented  as  an  addition  by  first 
negating  the  subtrahend  and  then  adding  terms.  Addition 
must  be  carried  out  according  to  (ii) . 


Example : 


Let  F - 17 


13-4 


13  + (-4) 


13  = 1101 

-4  = (-X5100  -*1011 


1101 


9 mod  17 

(iv)  General  multiplication 

When  one  multiplies  two  b-bit  integers,  one  gets 
a 2b-bit  product.  Let  CL  be  the  b-bit  low  order  of  it 
and  be  the  b-bit  high  order  part  of  it,  then 

AxB  = CL  + CH  2b  = CL  + (-CH)  mod  F ■ 

Thus,  all  one  has  to  do  is  subtract  the  higher  order  b-bit 
register  from  the  lower  order  b-bit  regis-ter.  The  subtrac- 
tion needs  to  be  done  according  to  (iii)  . 

2 

Example:  Let  Ffc  ® 2 + 1 = 17 

15x10  = 150  = 14  mod  17 


(v)  Multiplication  by  a power  of  2 

If  x is  taken  as  2 or  a power  of  2 (in  3.7), 
the  only  multiplications  involved  in  taking  the  Fermat 
transform  are  those  by  some  power  of  +2.  These  multipli- 
cations are  particularly  simply  to  implement  in  arithmetic 
modulo  Ffc.  Suppose  one  needs  to  multiply  A by  2 , 

0 < K < b,  all  one  needs  to  do  is  shift  to  the  left  the 
content  of  the  register  by  K bits,  and  subtract  the  K 
overflow  bits  (assuming  double  b-bit  registers) . If  K is 
outside  the  range  0 < K < b,  one  makes  use  of  the  fact 
2b  * -1  mod  Ffc. 

Computation  of  the  inverse  transform  requires 
multiplications  by  negative  powers  of  2 which  can  be 
converted  to  positive  powers  by  the  following  relationship 


-2b_K  mod  Ft 


(5.5) 


Example : 


Let  Ffc  = 2 + 1 = 17 


(i) 


15  x 2i 


60 


9 mod  17 


15  = 0000  1111 


Shift  left  2 positions 

‘■'H 


Cu  = 0011  — * 1100 

H 

+ 10 

-CH  = 1110 


(-cH> 


- 1100 
- 1110 


1001  * 9 mod  17 
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(ii)  13  x 2 3 = 13  x (-24"3)  = 13  x (-2)  = -26  = 8 mod  17 

13  = 0000  1101 

CH  CL 

Shift  right  dircularly  3 positions  1P_LP.  000^ 


cL  - 1010  0101  cH  = 0001 

+10  (-CL)  = 0111 

-CL  = 0111  1000  = 8 mod  17 

For  the  implementation  of  the  Fast  Fermat  Transform,  unlike 
the  FFT,  one  does  not  need  to  store  the  powers  of  x.  For 
serial  arithmetic,  one  could  have  a register  which  stores 
the  shift  amount  K,  and  as  one  goes  along  the  Fast  Fermat 
Transform  flow  chart,  one  continually  update  the  shift 
amount.  This  realization  of  b-bit  arithmetic  involves,  as 
previously  said,  some  input  quantization  error  when  (-1  = 2 ) 
occurs  as  one  input  sample,  as  well  as  the  extremely  small, 
but  realistic,  probability  of  a complete  data  block  in  error 
when  a (-1)  occurs  as  the  result  of  a FNT  computation. 

It  is,  of  course,  desirable  to  compute  the  FNT 
exactly.  The  difficulties  in  performing  binary  arithmetic 
modulo  Ffc,  become  apparent  when  one  considers,  for  example, 
multiplication  or  addition  in  a ring  of  integers  modulo  Ft, 

W 

involving  the  binary  representation  of  -1  = 2 . For  example, 
when  b = 4,  2^+1  = 17  the  product  of  10000  (-1)  with  itself 
is  0000M+1)  . 
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A technique  for  the  exact  computation  of  the  FNT 
and  its  implementation  are  described  by  McClellan  [26]. 
McClellan's  approach  involves  the  definition  of  a new  binary 
code  representation  for  the  integers  modulo  Ffc. 

Given  a binary  representation  of  (b+1)  bits, 

A = [a^/  ab— 1 ***  ao^  (5.6) 

this  new  code  is  described  as  follows.  If 

a^  = 1 then  A = 0 

(5.7) 

1 1_  — O 

a,  = 0 then  A = a,  . 2 + a.  0 2 + . . . + a 

b b-l  b-2  o 

where 

a . 

3 

Example : 

10000  represents  zero 

00111  -23  + 22  + 21  + 1 = -1  = 16  mod  17 

01011  23-22  +2+1=7  mod  17 


-1 


if  a.  = 1 
3 


if  a . = 0 
3 


(5.7a) 


Let  Ft  = 2b+l  = 24+l  = 17,  b = 4 
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and  10101  is  an  illegal  combination,  since  the  only  allowed 
number  with  the  most  significant  bit  (MSB)  equal  to  1 is 
10000  (zero) . Consider  arithmetic  operations  using  this 
number  representation. 

(i)  Multiplication  by  a power  of  two. 

If  the  number  is  zero  (i.e.,  a^  = 1)  one  does  nothing. 

If  the  number  is  nonzero,  the  low  order  b bits  are  circularly 
shifted  to  the  left  a number  of  places  equal  to  the  power 
of  2,  and  a bit  is  replaced  by  its  complement  as  it  enters 
the  least  significant  bit  position  (LSB) . 

Example:  Let  Ffc  = 2^+1  = 2^+1  =17,  b = 4 

Using  the  proposed  code 

01100  = 23  + 22  - 2 - 1 = 9 mod  17 


applying  the  above  rule 


9x2  = ? 


4 lower  order  bits 


,3  _,2_ 


1 mod  17 
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Further , 


9x8  = ? 


9 01100 


00000 

72  = 4 mod  17  = 00001  = -23-22-2+l  = -13+17  = 4 mod  17. 

In  a hardware  implementation  the  MSB  is  used  as  a control 
bit.  If  it  is  one  then  the  number  is  zero  and  the  rotation 
is  inhibited.  This  is  characteristic  of  all  operations 
using  this  new  coding  scheme. 

(ii)  Negative  of  a number 

This  is  done  by  complementing  the  low  order  b bits 
except  in  the  case  where  afe  = 1.  Again  the  MSB  is  a control 
bit  that  inhibits  the  operation  if  it  is  one. 


120 


Example : 


Let  Ffc  = 2b+l  = 24+l  = 17;  b = 4 
Using  the  proposed  code 


A = 01101  = 23  + 22  - 2 + 1 = 11  mod  17 

and  by  applying  the  above  rule 

-A  = 00010  = -23-22+2-l  = -11  mod  17 

(iii)  Addition 

If  either  or  both  of  the  operands  for  addition  are 
zero  (i.e.,  a^  = 1 or  = 1),  then  there  is  no  addition 
to  take  place.  That  is,  these  special  cases  can  be  sensed 
and  the  addition  inhibited.  Now  consider  the  addition  of 
two  numbers  A and  C where  A ^ 0 and  C ^ 0 . 

Let 


and 


A 


ab  ab-l 


a with  a,  = 0 

o b 


(5.8) 


C = c.  c.  . . . . c with  c,  = 0 
b b-1  o b 

Interpret  the  b LSB's  of  A and  C as  the  binary 

A A A A 

representation  of  A and  C,  and  form  the  same  A and  C using 

A 

unsigned  binary  addition  to  obtain  D. 
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That  is 


A 

+ 

C = c.  . 2b_1  + c.  0 2b-2  + ...  + c 
b- 1 b-2 o 

A+C  = D = d.  2b  + d,  . 2b_1  + dw  „ 2b~2  + ...  + d 
b b-1  b-2  o 

A 

It  is  possible  to  deduce  from  D,  the  desired  sum 
D = (A+C)  mod  2b+l  [26]. 

Notice  that 

/\ 

A = 2A  + 2 [26]  (5.10) 

Example:  Let  F = 2b+l  = 24+l  = 17 

Using  the  proposed  code 


_ ,b~l  . . ,b-2 

ab-l  2 ab-2  2 + 


+ a. 


(5.9) 


! A = 01010  = 23-22+2-l  = 5 mod  17 

f 


by  (5.9) 

?• 

A = (1)  23  + (0)  22  + (1)  21  + 0 (2°) 

= 8 + 2 - 10 
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J 


[ 


and, 

2A  = 2(10)  = 20 

+ 

2 

2A  +2  = 20+2  = 22  = 5 mod  17  = A. 

/x 

To  deduce  from  D the  desired  sum  D,  verify  that 

A A A A « 

D = A+C  = 2A+2  + 2C+2  = 2A+2C+4  mod  2+1  . 

Example:  Let  Ffc  = 17 

and 

A = 01010  = 5 A = ( 1 ) 2 3 + ( 0 ) 2 2 + (1)  2 = 8+2  = 10  mod  17 

C = 00011  = 8 C = (D21  + (1)2°  = 3 mod  17 

So 

A A 

D = 2A  + 2C  + 4 * 2(10)  +2(3)  + 4 = 30  = 13  mod  17 

and,  checking 

D = C + A = 8 +5  =13  mod  17. 
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If  D can  be  expressed  as 


D = 2 D + 2 mod  2b  + 1 (5.12) 

with  D a b-bit  number,  then  the  b-bits  of  D are  the 
b-LSB ' s of  D [26]  . 

Example:  Let  Ffc  = 17 

and 

D = 01010  = 23  - 22  + 21  - 1 = 10  - 5 = 5 mod  17 

Since  D can  be  expressed  as: 

D = 2 D + 2 mod  17 

where 

D = 1010  = ( 1 ) 2 3 + (0)  22  + (1)  2 + (0)  1 = 8 + 2 = 10 

(check:  D - 2 D + 2 = 2(10)  + 2 = 22  = 5 mod  17) 

the  4 bits  of  D = 1010  are  the  4 LSB's  of  D = 01010. 

There  are  two  cases  depending  on  the  value  of  d^  in  (5.9). 

1 - If  d.  * 1,  then  by  [26] 

D = 2b  + D ' - D'  - 1 mod  2b+l  (5.13) 
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Example 


17 


and 


Let  F = 


A = 01010  = 5 mod  17 
C = 01011  = 7 mod  17 


one  has 


A = (1)  23  + ( 0 ) 2 2 + (D21  + (0)1  = 8 + 2 = 10 

C = (1)  23  + (0)  22  + (D21  + (1)1  = 8+2  + 1 = 

D = (1)  24  + ( 0 ) 2 3 + (1)22  + ( 0 ) 2 1 + 1 = 21  = 4 


Comparing  with 


D - d.  2b  + d.  . 2b_1  + , + d^ 
b D-l  o 


one  verifies  that 


in  these  conditions 


D 


2 4 + D'  mod  17 


11 

mod 


where,  by  (5.13) 


A 

D'  = D + l = 4 + 1 = 5. 


Then 

A 

D=  16+5  =21=  4=  5-1  = 4 mod  17, 

checking  (5.13).  Thus, 

D = (2A  + 2C  + 4)  mod  2b+l  = (2D+4) 

D = (2C'  + 2)  mod  2b  + 1 

and 

C = C* 

Example:  Let  F = 17 

A = 01010  = 5 
C = 01011  - 7 
D - A+C  = =12 

one  has 

A 

A = 10 

C = 11 

A /\  A 

D = A+C  = 21 


mod  2b+l 


(5.14) 


(5.14a) 
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so 


^ A , 

(1)  D = 2 (A  + C ) +4  mod  2°  + 1 

D = 2(10+11)  + 4 = 42  + 4 = 46  = 12  mod  17 

/N 

(2)  D = (2  D + 4)  mod  17 

D = 2(21)  +4  =46  = 12  mod  17 

(3)  In  the  previous  example  D'  = 5 

so 


D = 2(5)  + 2 = 12  mod  17 


The  condition  D = D' , in  (5.14a),  can  be  verified: 

1 • 

A = 01010  = 5 

+ 


01011 


D = A+C  = Q0101 
0101 


-8  + 4 - 2 + 1 = -5  = 12  mod  17 


Since 

D = ( 0 ) 2 3 + (1)22  + (021  + 1 = 5 mod  17  (by  5.12) 

then 


D = D'  = 5 mod  17. 
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1 


2 - If  dfa  = 0 in  (5.9),  then  [26] 

D = 2D'  + 4 mod  2b+l 


I 

I 


(5.15) 


and 


D = D'  + 1 


(5.16) 


Example: 

Let 

*1 

rt 

ii 

17, 

r b 

A = 00 

111  = 

16 

mod 

17 

C = 01 

Oil  = 

7 

mod 

17 

Qooio 

0 


D = 00  010  = -8-4+2-1  = -11  = 6 mod  17 


Since  by  (5.12) 

D = ( 0 ) 2 3 + ( 0 ) 2 2 + (1)  2 + (0)  = 2 

and  by  (5.16) 

D'  = 1 


checking. 


D = 2 D'  + 4 mod  2b+l  by  (5.15) 

D = 2(1)  + 4 - 6 mod  17. 


JL, 
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Notice,  that,  this  way  of  performing  addition  results  in  an 
extra  level  of  add,  as  in  the  case  of  l's  complement  arithmetic. 


In  l's  complement  arithmetic,  the  output  carry  is 
added  to  the  LSB. 

In  this  new  mod (2^+1)  arithmetic,  one  takes  the  output 
carry,  complements  it  and  adds  it  to  the  LSB.  That  is, 
this  new  arithmetic  is  only  as  complex  as  the  l's  complement 
arithmetic. 

There  is  a small  amount  of  additional  complexity  due 

but  this  acts  only  as  an  inhibit  signal. 

+8+4+2+1  = 15  mod  17 
-8-4+2+1  =-9=8  mod  17 

-8-4+2-1  = -11  = 6 mod  17 


to  the  control  bit. 


Example : 
(1) 


01111  = 

00011  = 
(0)010 

00010  = 


(2)  01111  = 15  mod  17 


10000  = 0 
01111  = 15  mod  17 


MSB  = 1 inhibits  the  addition 


(3) 


01011 

(+) 00100 

Q^l 

10000  = 0 mod  17 


8-4+2+1  = 7 mod  17 
-8+4-2-1  = -7  mod  17 


129 


In  this  last  example  note  that  the  second  add  automatically 
produced  the  control  bit  indicator,  for  the  special  case 
of  zero. 

How  does  one  convert  from  a binary  coded  representation 
of  numbers  to  this  new  representation? 

The  code  conversion  between  a binary  representation 
and  this  new  code  falls  into  two  cases. 

Let  M be  a number  which  is  represented  in  both  codes. 

Let 


mb  “b-l  •**  mo 


(5.16) 


be  the  binary  representation  of  M. 
Let 


mb  "Vl  mo 


(5.17) 


be  the  new  representation. 

A 

Also,  let  M be  the  number  represented  by  interpreting 
(5.17)  as  a binary  code.  The  conversion  rules  are  as 
follows: 

/v 

1)  If  M = 0,  then  m^  * 1,  M * 0,  and  = 0 for 
K =*  0,  1,  ...  b-1. 

This  is  a special  case  and  is  done  separately. 

2)  If  M y 0,  then  it^  = 0 and 


M 


(2M  + 2)  mod  2U  + 1 


(5.18) 
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Conversion  from  the  new  code  to  binary  is  implemented 

A i_ 

by  forming  2M  + 2 and  comparing  this  sum  to  2 . 

b b 

If  the  sum  is  larger  than  2 , then  2°+l  is  subtracted 
to  give  the  proper  binary  representation  of  M. 

Example:  Let  = 17 

M = 01011  new  code  = 23-22+2+l  * 7 mod  17 

M = 8+2+1  = 11 

By  the  above  rule 

2M  + 2 = 2(11)  + 2 = 24  = 7 mod  17 

• and 


7 = 00111  binary  representation  of  M. 

If  the  binary  representation  is  given  the  sum 

M + 2b  - 1 (5.19) 

is  formed.  If  the  result  is  odd/  2b+l  is  subtracted;  and 
finally  this  result  is  right  shifted  one  place.  The  resulting 
b bits  are  the  b LSB's 

mb-l  "V- 2 * ‘ * mo  * 
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Example: 


17,  b = 4. 


Let  Ffc  = 

Given  the  binary  representation  of  M = 11111  = 14  mod  17 
to  obtain  new  code: 

(i)  form 

M + 2b-l  = 14+16-1  = 29 

the  result  is  odd  then  by  the  above  rule. 

(ii) 

29-17  = 12 

(iii) 

12 

= 6 00110  binary  representation 

(iv)  new  code  4 LSB's  of  the  binary  obtained  in 
(iii),  i.e., 

new  code  00110  - -8+4+2-1  = -3  = 14  mod  17 

Notice  that  the  code  described  so  far,  due  to  McClellan  [26], 
is  a special  case  of  a more  general  class  of  code  trans- 
lations discussed  by  Leibowitz  [27]. 

These  translations  involve  the  one-to-one  representation 
of  a number  A in  the  ring  of  integers  modulo  Ffc,  as  the 
binary  number  corresponding  to 

R A - 1 mod  F (5.20) 

where  R is  any  integer  in  the  ring  with  an  inverse  [27]. 


Notice  that  the  code  representation  of  McClellan  [26] 
corresponds  to  the  case  of  [27] 

R = 2b_1  + 1 mod  Ft  (5.21) 

Example:  Let  Ffc  = 2b  + 1 = 24  + 1,  b=4 

then 

R = 24-1  + 1 = 23  + 1 = 9 mod  F 

Using  (5.20),  for  A = 5 mod  17  the  code  will  be  given  by 

RA-1  = 9(5)  -1  = 44  = 10  mod  17 

IO^q  = 01010  code 

checking,  using  (5.7) 

01010  = +8 -4  + 2-1  = 10-5  = 5 mod  17. 

Notice  that  the  simplest  code  translation  corresponds  to 
R = 1,  for  any  value  of  b.  Leibowitz  [27]  concentrates  on 
the  resulting  binary  arithmetic  for  the  code  translation 
corresponding  to  R = 1. 

Recall  that  to  represent  all  integers  in  the  ring  modulo 
Ffc  requires  (b+1)  bits.  The  additional  bit  is  required  in 
order  to  represent  the  number  2b  = -1  mod  Ffc. 
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In  order  to  overcome  the  problem  of  performing  binary 
arithmetic  with  this  additional  bit,  one  allows  the  addi- 
tional bit  to  be  a 1 only  when  the  number  to  be  represented 
is  a zero. 

One  way  to  do  this  is  achieved  by  subtracting  1 from 
the  normal  binary  representation  of  every  integer  in  the 
ring  and  corresponds  to  the  above  set  of  code  translations 
with  R = 1. 

Example:  Let  Ffc  = 2b + 1 = 24 + 1 = 17,  b = 4 


Normal  value 

Binary  representation 

Diminished 
value  (R  = 

-1 

1) 

0 

00000 

1 

1 

00001 

2 

2 

00010 

3 

3 

00011 

4 

4 

00100 

5 

5 

00101 

6 

6 

00110 

7 

7 

00111 

8 

8 

01000 

9 (-8) 

( 

9 (-8) 

01001 

10  (-7) 

10  (-7) 

01010 

11 (-6) 

ll(-6) 

01011 

12 (-5) 

12 (-5) 

01100 

13 (-4 ) 

13 (-4) 

01101 

14  (-3) 

14  (-3) 

OHIO 

15 (-2) 

15 (-2) 

01111 

16 (-1) 

16  (-1) 

10000 

0 

5.22) 
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In  the  diminished  -1  number  representation  the  b-least 
significant  bits  (LSB’s)  indicate  the  normal  value  of  the 
number.  The  numbers  from  1 to  2^  are  represented  in  order 
by  the  binary  numbers  from  0 to  2^-1. 

Using  this  representation,  the  arithmetic  operations 
necessary  to  perform  convolution  by  FNT's,  will  be  discussed 
next. 

1.  Negation 

It  can  be  seen  from  (5.22)  that  each  of  the  negative 
b”  1 

numbers  (>2  = 8)  is  the  b-LSB's  complement  of  its 

positive  counterpart. 

Example:  Let  F = 17 

Diminished  -1  value  Binary  representation 

A = 6 00101 

-A  = -6  = 11  01010 

Notice  that  if  the  MSB  is  1,  the  negation  is  inhibited. 

Thus,  the  negative  of  a nonzero  number  in  the  diminished 
-1  representation  is  the  complement  of  its  b-LSB's. 

2 . Addition 

To  perform  addition  of  two  numbers  represented  as 
(A— 1 ) and  (B-l) 

(A-l)  + (B-l)  = (A+B-l)  -1 
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and  thus 


(A+B-l)  = [ (A-l)  + (B— 1 ) ] + 1 


(5.23) 


Since  the  (b+l)*"*1  bit  of  the  addends  is  used  only  to 
inhibit  addition  if  an  addend  is  zero,  addition  of  nonzero 
addends  involves  only  the  b-LSB's. 

Equation  (5.23)  indicates  that  a 1 must  be  added 
to  the  sum  of  two  diminished  (-1)  numbers  to  provide  a 
correct  result.  When  a carry  is  generated  from  the  b-bit 
sum,  a residue  reduction  modulo  F1  requires  the  subtraction 
of  a 1 since  2°  = -1  mod  and  no  corrective  addition  ic 
necessary.  Thus  to  add  to  numbers  in  diminished  -1 
representation : 

1)  If  the  MSB  of  either  addend  is  1,  inhibit  the 
addition  and  the  remaining  addend  is  the  sum. 

2)  If  the  MSB  of  both  addends  are  0,  ignoring  the  MSB, 
add  the  b-LSB's,  complement  the  carry  and  add  it  to 
the  b-LSB's  of  the  sum. 

The  (b+l)th  bit  or  MSB  of  the  resulting  sum  is  the  carry 
out  of  the  b*"*1  bit. 


Example : 


Let  F = 17,  b - 4 


1.  Diminished  -1  value 

add  3 

+ 

2 


5 


Binary  representation 

00010 

00001 

CSBSjr 

00100 
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2. 


add 


7 


00110 

10000 

00110 


3.  add  13 

+ 

9 


22  = 5 mod  17 


01100 

01000 

00100 

^*0 

00100 


4. 


add 


11 

+ 

6 


17  = 0 mod  17 


01010 


00101 

0LUi 

10000 


3 . Code  Translation 

Let  B represent  the  binary  representation  of  a given 
number  and  D its  diminished  -1  representation. 

To  perform  code  translation  from  binary  to  diminished 
-1  representation,  one  does  a diminished  -1  addition  of  B 
and  the  binary  representation  of  2b  - 1 [27]. 


Example: 


Let  F - 17,  b « 4. 


Binary  number  B = 

2b-l  » 


00101 

01111 

Qoioo 

Diminished  -1  value  D = 00100 
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i.e.,  the  binary  number  B = 00101  = 5 is  represented  in 
diminished  -1  representation  by 

D = 00100  = (4)  = (5-1). 


The  translation  from  diminished  -1  representation 
to  binary  representation,  is  performed  by  complementing  the 
MSB  of  D and  adding  it  to  the  b-LSB’s  [27]. 


Example : 


Diminished  -1- 
Binary 


D 

B 


= 00011 


3 


0 Diminished  -1  representation 
Binary  representation. 

4 . Subtraction 

One  can  perform  subtraction  in  the  dimished  -1 
arithmetic  by  negating  the  subtrahend  and  adding  it  to 
the  minuend. 


Example : 


d = Qooo 


000 
0 


B 


00000 


Example : 


subtract 


Diminished  -1 
8 


Binary  representation 
00111 

01010 

00000 

Q 

00061 
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5.  Multiplication  by  Powers  of  2 

In  performing  a multiplication  if  the  multiplier  or 
multiplicand  are  0,  as  detected  by  the  presence  of  a 1 in 
the  (b+l)th  bit,  the  multiplication  is  inhibited  and  the 
product  is  zero. 

To  perform  multiplication  of  diminished  -1  numbers 
by  powers  of  2,  notice  that: 

(A-l) 2 = ( 2 A- 1 ) - 1 

and  thus 


(2A-1)  = (A-l) 2 + 1 . (5.24) 

therefore,  each  multiplication  by  two  involves  a left  shift, 
ignoring  the  MSB,  and  a corrective  addition  of  a 1.  If  the 
bit  shifted  out  from  the  bth  position  is  a zero,  it  is  com- 
plemented and  shifted  into  the  LSB  in  order  to  accomplish 
the  addition  of  a 1.  If  this  bit  is  a 1,  a subtraction  of 
1 is  also  required  to  accomplish  a residue  reduction  (2b=  -1) . 
With  the  corrective  addition  of  +1,  these  cancel  out  and  a 
0 is  shifted  into  the  LSB. 

Thus  for  each  factor  of  2,  a left  circular  shift 
of  the  LSB’s  is  required  and  the  bit  circulated  into  the 
LSB  is  complemented. 
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T 


i 

I 

I 


Example:  Let  Ffc  = 17,  b = 4 

16x9  = 24  x 9 = 2x2x2x2x9  = 8 mod  17 


8 mod  17 


to  carry  out  convolution 
with  the  FNT  is  a general  multiplication  by  any  two 

J 

integers  modulo  Ffc. 

To  perform  a multiplication  of  the  numbers  A and 
B represented  as  (A-l)  and  (B-l)  in  diminished  -1  number 
representation  system,  notice  that 

(A-l)  (B-l)  = A* B - (A+B)  + 1 

= (A* B-l)  - (A+B-l)  + 1 


MSB 

* 


9 

2x9 

ojiopp 

o&feb 

4x9 

opooi 

8x9 

oloon 

1 

16  x 9 

OjOlll 

6 . General  Multiplication 

The  last  operation  required 


and  the  desired  result  is 

(A- B-l)  = (A-l) (B-l)  + (A+B-l)  - 1 (5.25) 

thus,  to  carry  out  such  an  operation,  ignore  the  MSB's, 
perform  a binary  multiplication  of  the  diminished  -1 
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representation  of  A and  B,  add  this  result  to  the  b-LSB's 
of  the  diminished  -1  addition  of  A and  B and  then  perform  a 
residue  reduction  by  a diminished  -1  subtraction  of  the 
b-MSB's  from  the  b-LSB's. 

Notice  that  this  particular  general  multiplication 
scheme  is  not  applicable  with  code  translations  other  than 
that  corresponding  to  R = ±1.  As  discussed  previously, 
if  the  MSB  of  either  the  multiplier  or  the  multiplicand  is 
1,  then  the  multiplication  is  inhibited  and  the  result  is 
set  to  zero. 

Example:  Let  Ffc  = 17,  b = 4. 

Multiply 

13  • 
x 

11 

143  = 7 mod  17 


binary  multiply 
01100 

01010 


011000 

0110000 


01111000 
binary  add  00110 
01111110 


diminished  -1  add 


01100 

01010 

Qono 

00110 


residue  1000 

reduction 


00110  7 mod  17. 
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An  alternate  multiplication  technique  can  also  be  considered. 
Assuming  that  the  numbers  are  determined  to  be  nonzero  and 
a multiplication  is  required,  a translation  to  normal  binary 
coding  is  performed.  Following  a binary  multiplication,  a 
residue  reduction  by  diminished  -1  subtraction  of  the  b- 
MS's  of  the  product  from  the  b-LSB's  is  carried  out.  The 
result  is  the  desired  product. 

Let  Ft  - 17,  b = 4 

= 7 mod  17 


translate 


translate 


Since  in  most  cases  the  translation  from  diminished  -1  to 
binary  will  be  simpler  and  faster  then  a general  binary 
or  diminished  -1  addition,  the  second  technique  is  the  most 
preferable. 

To  conclude  part  A of  this  section,  one  can  say  that 
Leibowitz  [27]  presents  a binary  arithmetic  applicable  to 


Qoili 


110 
0 


7 mod  17 


00110 


Example : 


multiply 


13 

c 

11 

143 
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the  exact  computation  of  the  FNT,  that  this  diminished  -1 
representation  is  mathematically  simpler  than  that  of 
McClellan  [26].  Also  the  hardware  presented  in  [26]  to 
compute  the  FNT,  and  discussed  in  part  B of  this  section, 
can  be  applicable  to  this  new  technique  with  the  exception 
of  the  code  translation. 

B.  SOFTWARE  AND  HARDWARE  REALIZATIONS  OF  THE  FNT 

The  Fermat  Number  Transform  has  been  proposed  as  an 
aid  to  the  rigid  and  precise  computation  of  convolution  for 
digital  filtering  applications.  Since  this  transform  does 
not  require  multiplications,  it  is  considerably  faster 
than  the  FFT  [17]. 

In  what  follows,  a discussion  of  a software  implementa- 
tion of  the  FNT,  due  to  Agarwal  and  Burrus  [4] , as  well  as 
the  description  of  a special  purpose  hardware  to  compute  the 
FNT,  realized  by  McClellan  [26]  , is  preselited. 

In  their  assembly  language  realization  of  the  FNT,  on 
the  IBM  370/155  computer,  Agarwal  and  Burrus  define  a 
binary  arithmetic  modulo  F = 2b+l,  b = 2fc. 

Notice  that  the  IBM  360/370  series  uses  a 32-bit  word 

length  for  fixed-point  arithmetic,  and  therefore  is  well 

suited  for  the  implementation  of  convolution  using  the  FNT 
32 

modulo  F5  = 2 +1. 

It  has  two's  complement  representation  of  negative 
integers,  i.e.. 


(-x)  is  represented  as  2 


32 


- x 


(5.26) 
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Since  in  arithmetic  mod  Ffc  one  wants  negative  integers 

32 

to  be  represented  as  a complement  of  2 + 1,  i.e. 

32 

(-x)  is  to  be  represented  as  (2  +l-x)  (5.21) 

One  adds  1 whenever  a negative  number  is  encountered  in 
data . 

As  noted  before  with  b bits,  the  quantity  2^  = -1  mod  F 

has  no  representation.  Thus  in  the  370/155,  one  cannot 
32 

represent  2 = -1.  If  a (-1)  is  encountered  in  the  data 

it  is  rounded  either  to  0 or  to  -2.  If  data  is  uncorre- 
lated, the  probability  that  (-1)  will  appear  after  an 

arithmetic  operation  during  the  various  stages  of  the  FNT  t 

* 

computation  is  roughly 

2~32  - 10'10  (5.28) 

To  add  two  32-bit  integers  modulo  F,.,  the  logical  add 
instruction  (ALR)  is  used  which  adds  the  two  integers  and 
sets  the  condition  code  depending  on  carry  or  no  carry. 

After  the  logical  add  instruction,  a conditional  branch  is 
taken  depending  on  the  condition  code.  If  the  condition 
code  indicates  a carry  bit  in  the  logical  add  operation,  one 
is  subtracted  from  the  result,  otherwise  it  is  left  unchanged. 

This  sequence  of  operations  completes  one  addition  modulo  F^. 

Similarly,  subtraction  modulo  F5  is  performed  using  a 
logical  subtraction  (SLR)  instruction  followed  by  a conditional 
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branch  instruction.  To  multiply  a number  by 


2K  mod  F5  , 0 < K < 32 

the  number  is  loaded  in  the  odd  register  of  an  even-odd 
fair  of  registers.  The  even  register  is  filled  with  zeros 
and  this  double  register  is  shifted  left  by  K positions 
using  a shift  left  double  logical  (SLDL)  instruction. 

After  the  shift  operation,  the  even  register  is  subtracted 
from  the  odd  register,  modulo  F^.  This  sequence  of  instruc- 
tions  completes  multiplication  by  2 mod  F^. 

To  multiply  a number  by 

2~K  mod  F5  , 0 < K < 32  , 

the  number  is  loaded  in  the  even  register  of  an  even-odd 
pair  of  registers,  the  odd  register  is  filled  with  zeros 
and  this  double  register  is  shifted  right  by  K positions 
using  a shift  right  double  logical  (SRDL)  instruction. 

After  the  shift  operation,  the  odd  register  is  subtracted 
from  the  even  register,  modulo  F^ . This  sequence  of 

_K 

operations  completes  multiplication  by  2 mod  F^.  To 
multiply  two  numbers  mod  F,.  requires  a somewhat  larger 
number  of  operations.  First  one  determins  the  signs  of 
the  two  integers  and  multiply  the  signs.  If  any  of  the 
numbers  is  detected  as  negative,  its  absolute  value  mod  F^ 
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is  taken  by  taking  its  two's  complement  and  adding  one  to 
it.  Their  absolute  values  are  multiplied  (MR)  to  obtain 
a 64-bit  double-register  product,  then  the  even  register 
(higher  order  32  bits)  is  subtracted  from  the  odd  register 
(lower  order  32  bits)  mod  F^..  Finally,  if  the  product  of 
signs  was  detected  negative,  one  multiplies  the  result  by 
(-1)  by  taking  its  two's  complement  and  adding  one  to  it 
[5.27]  . 

Assembler  subprograms  were  written  [4]  to  compute  Fast  < 

Fermat  Number  Transforms  and  inverse  Fermat  number  transforms 
for  an  sequence  length  from  2^  to  2^,  taking  x as  a power  of  < 

2.  A decimation  in  frequency  algorithm  with  normally 
ordered  input  and  bit  reversed  output  was  used  for  the 
computation  of  the  Fast  Fermat  Number  Transform  [4] . A 
decimation  in  time  algorithm  with  bit  reversed  input  and 
normally  ordered  output  was  used  for  the  computation  of  the 
fast  inverse  FNT. 

For  both  of  these  subprograms , the  only  multiplications 
required  were  by  a power  of  2,  which  were  implemented  as 
discussed  above.  Two  more  subprograms  were  written  to 
compute  fast  FNT' s and  inverse  FNT's  for  length  - 128 
sequences,  using  a = /2  given  by  (4.16). 

Multiplications  required  to  multiply  the  two  transforms 
are  general  multiplications  mod  F,.  and  are  performed  as 
discussed  earlier. 

To  cyclically  convolve  sequences  longer  than  128,  two- 
dimensional  convolution  schemes  were  used. 
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One  such  scheme  was  2 by  128  convolutions  for  length 
256  sequences  discussed  in  section  10  of  [18] . Another 
program  was  written  to  convolve  long  one-dimensional 
sequences  using  two-dimensional  FNT,  as  discussed  in 
Appendix  B.  In  Section  6,  results  of  the  comparison  with 
the  FFT  are  presented. 

The  special  attributes  of  the  FNT  led  several  researchers 
to  consider  seriously  special-purpose  hardware  implementa- 
tions of  the  transform.  The  machine  constructed  by  McClellan 
[26] , is  an  implementation  of  a 64-point,  16-bit  FNT 
(modulo  F4  = 2 ^ + 1).  This  hardware  system  applies,  the 
second  coding  scheme,  described  in  part  A of  this  section, 
to  the  logic  design  of  the  butterfly  of  the  FNT  algorithm. 

The  fast  FNT  algorithm  implemented  is  a radix-2  constant 
geometry  decimation  in  frequency  (DIF)  decomposition  of 
the  FNT. 

Fig.  1 shows  a flow  diagram  of  this  algorithm  for  a 
16-point  transform  [38] . Although  the  constant  geometry 
structure  does  not  allow  in  place  calculation  of  the  trans- 
form, it  simplifies  the  memory  addressing  because  the  addressing 
does  not  vary  from  stage  to  stage.  The  price  one  pays  for 
this  simplification  is  a doubling  of  the  memory  size 
required  for  the  transform.  However,  128  words  per  chip  is 
a convenient  level  of  integration  with  ECL  (emiter  coupled 
logic) , thus  making  the  constant  geometry  structure 
attractive  [26] . 
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FIGURE  1.  Flow  Diagram  of  a Radix-2,  16  Point,  Constant 
Geometry  FFT  Algorithm  Using  the  Decimation 
in  Frequency  Structure 
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Fig.  2 is  a block  diagram  of  the  complete  system  showing 
the  four  major  subsystems.  The  computational  element  (CE) 
is  a radix-2  DIF  butterfly  for  the  fast  FNT  algorithm;  the 
memory  element  contains  128  x 17-bit  words  for  use  as  inter- 
mediate storage  during  the  computation  of  the  transform; 
the  control  element  is  a hardwired  implementation  of  the 
fast  FNT  algorithm  [26];  and  the  input/output  (I/O)  section 
provides  the  interface  with  the  fast  digital  processor  (FDP) . 
McClellan  states  that  the  goal  in  building  this  hardware  was 
to  construct  a CE  that  would  operate  at  a clock  rate  of  40 
MHz.  In  order  to  achieve  this  speed,  ECL  10K  circuits  were 
used.  The  basic  gate  in  this  logic  family  as  a propagation 
delay  of  2 ns,  and  thus  these  circuits  are  well-suited  for 

very  high-speed  systems.  Even  with  such  high  speed  logic 
t 

circuits,  two  levels  of  reclock  and  fast  carry  addition  were 
used  in  the  CE  to  realize  a working  system  that  runs 
reliably  at  38  MHz  [26] . 

Fig.  3 shows  a functional  diagram  of  the  butterfly 
which  consists  of  an  adder,  a subtracter,  a rotator,  input 
buffer  registers  RA  and  R0 
Ry  and  an  output  register  Rz-  Register  transfers  are  made 
at  each  clock  pulse,  so  that  data  are  always  flowing  through 
the  CE  as  would  be  the  case  in  a pipelined  fast  FNT. 

As  the  timing  diagram  in  Fig.  4 shows,  the  output  of 
the  butterfly  is  only  written  into  memory  from  Rz  at  t^  and 
tg.  During  the  other  clock  epochs  the  contents  of  Rz  may 
be  changing  but  this  does  not  affect  the  algorithm. 


, reclock  registers  Rw,  R^  and 
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FIGURE  2.  Block  Diagram  of  the  64-Point  FNT  Hardware 
System  Illustrating  the  Data  Flow  Between 
the  Major  Subsystems 
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FIGURE  3 
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A+B  and 
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FIGURE  4 . Timing  Diagram  for  the  Internal  Clocking 
Operation  of  the  FNT  Butterfly.  Register 
Transfers  at  Each  Clock  Pulse  are  Also  Shown 
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Recall  that  in  McClellan's  code,  the  rule  for  addition  of 
two  nonzero  numbers. 


and 


A ^a16  a15  * * * ao] 


B [b16  b15 


. bQ]  is  as  follows: 


STEP1 : Add  the  16  LSB's  of  A and  B with  the 

carry  in  equal  to  zero 

STEP2 : Complement  the  carry  out  from  step  1 and 

add  it  to  the  sum  of  step  1. 


If  either  A or  B is  zero  (i.e.,  b16  or  alg  = 0),  then  the 
carry  must  be  inhibited.  Finally  if  both  A and  B are  zero 
the  MSB  of  the  sum  is  set  to  one.  Fig.  5 shows  a realization 
of  the  addition  process.  The  structure  of  Fig.  5 is  ineffi- 
cient in  two  respects.  First  of  all,  two  16-bit  adders  are 
required,  although  the  second  one  is  simple  because  one 
input  is  zero.  Secondly  the  addition  is  very  slow  because 
the  carry  must  propagate  through  the  16-bit  adders.  The 
use  of  carry  look  ahead  (CLA)  logic  as  in  Fig.  6 will 
improve  both  situations.  For  details  of  the  implementation 
using  ECL  building  blocks  see  McClellan's  paper  [26]. 

Notice  that  the  subtracter  A-B,  can  be  implemented 
by  complementing  B and  adding  it  to  A. 
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iprovement  of  the  Fermat  Number  Adder  by 
le  Introduction  of  CLA  Logic  to  Produce 
ie  End  Around  Carry  Required  in  Fig.  5 
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The  addition  A+B  completes  the  calculation  of  one  output 
of  the  computational  element  (see  Fig.  3) . The  result  is 
held  in  the  register  Rw  and  then  is  moved  to  Rz  to  be  written 
back  into  memory.  For  the  other  output  of  the  CE,  the 
quantity  A-B  is  stored  in  the  reclock  register  Rx  for 
subsequent  rotation  by  a power  of 

cl  - /2  given  by  (4.16),  , 


a = /2  = 2b/4(2b/2-l)  = 216/4 (216/2-l)  - 24(28-l)  = 212-24 

(5.28) 

The  rotation  by  ^2^  is  split  into  two  stages. 

In  the  first  stage,  the  quantity  X = A-B  is  multiplied 
by  /2  = 2x2  - 24  if  K is  odd.  The  H multiplier  is  merely 
a subtracter  (5.28). 

A 2:1  multiplexer  at  the  output  of  the  subtracter 
selects  whether  the  input  is  to  be  multiplied-  by  /I  or 
by  1,  and  is  controlled  by  the  LSB  of  K [26]  . The  result 
of  this  calculation  is  stored  in  the  reclock  register  R^. 

The  second  stage  of  the  rotation  is  a multiplication  by  a 
power  of  2,  namely  [K/2] . ([  ] denotes  the  greatest 

integer  function.)  This  multiplication  is  implemented  as 
a 16:1  multiplexer  controlled  by  the  upper  four  bits  of 
the  binary  representation  of  the  power  of  2.  The  shifting 
network  is  followed  by  a 2:1  multiplexer  which  selects 
which  butterfly  output  (A+B  or  2K*y)  is  to  be  stored  in 
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Rz  and  then  written  back  into  memory.  This  multiplexer 
is  controlled  by  t3  and  its  output  is 

(2K  • Y)  F3  + W • t3 

where  Y and  W are  the  contents  of  Ry  and 

McClellan  states  that  the  logic  for  the  computational 
element  consists  of  90  IC's,  all  of  which  are  located  on 
one  board  (18x16  in.). 

In  Section  VI,  a comparison  between  the  complexity  of 
various  basic  operations  involved  in  computing  Fermat 
Number  Transforms  vis-a-vis  the  FFT,  in  software  as  well  as 
hardware  implementations,  will  be  presented.  The  software 
and  hardware  implementations  discussed  there  will  be  those 
described  in  this  section.  The  results  will  show  only  the 
efficiency  of  these  implementations,  not  all  the  possibili- 
ties of  NTT  in  general. 


R^  respectively. 


157 


VI.  FERMAT  NUMBER  TRANSFORM  VERSUS  FFT 


The  FNT  provides  an  efficient  and  error-free  means  of 
computing  cyclic  convolutions.  The  purpose  of  this  section 
is  to  compare  this  method  with  the  standard  implementation 
of  convolution.  Results  obtained  both  by  Agarwal  and 
McClellan,  respectively  in  software  and  hardware  implemen- 
tation of  the  FNT,  are  presented. 

Computation  of  the  FNT  of  length  N requires  on  the  order 

of  N log2  N additions,  bit  shifts  and  subtractions  but  no 

multiplications.  The  only  multiplications  required  for 

our  FNT  implementation  of  cyclic  convolution  are  the  N 

multiplications  required  to  multiply  the  transforms.  This 

is  a very  efficient  technique  for  computing  convolution, 

but  unfortunately,  the  maximum  transform  length  for  an  FNT 

is  proportional  to  the  word  length  of  the  machine  used. 

Agarwal  and  Burrus  [17]  showed  that  a very  practical  choice 

32 

of  a Fermat  number  for  this  application  is  F5  = 2 +1,  and 

that  the  FNT  mod  F^  can  be  implemented  on  a 32-bit  machine, 
namely  the  IBM  360/370  series  computers. 

Suppose  one  wants  to  calculate  the  convolution  of  two 
sequences  x(n)  and  h(n)  having  b-^  and  b2  bit  representations, 
respectively,  and  that  the  sequence  length  of  both  is  N. 

Then  the  output  y(n),  given  by  (3.2),  would  have  at  the 
most  [17]  a 


b^  + b2  + log2N 


(6.1) 
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bit  representation.  To  obtain  the  correct  result  b,  the 
number  of  bits  of  the  output,  should  be  [17] 

b _>  b1  + b2  + log2N  (6.2) 

Notice  that  a better  bound  on  the  output  can  be  achieved 
[4]  . 

Roughly  speaking,  one  needs  twice  the  number  of  bits  to 
carry  out  the  convolution  using  the  FNT  as  compared  to  the 
fixed  point  FFT  implementation  of  the  convolution.  But  in 
the  DFT , every  data  point  is  treated  as  a complex  number 
[17]  and  therefore  requires  two  words,  one  for  the  real  part 
and  one  for  the  imaginary  part.  Thus  in  effect  the  hard- 
ware requirement  for  two  transforms  are  about  the  same. 
Although  for  real  data  it  is  possible  to  make  use  of  the 
symmetry  properties  of  the  DFT,  they  require  extra  computa- 
tion and  for  the  purpose  of  comparison  it  will  be  ignored, 
although  Agarwal  and  Burrus  had  taken  this  into  account  for 
their  IBM  370/155  implementation.  In  fact,  they  assumed, 
in  the  FFT  implementation,  each  data  point  is  represented 
by  a b/2  bit  real  part  and  a b/2  bit  imaginary  part.  One 
b/2  bit  complex  addition  is  equivalent  [17]  to  two  b/2  bit 
real  additions,  which  are  equivalent  to  a b-bit  addition 
modulo  Ffc.  Thus  the  complexity  of  addition/subtraction  is 
the  same  in  both  the  transforms.  Similarly,  it  can  be 
shown  [4]  that  a b/2  bit  complex  multiplication  is  equivalent 
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to  a b-bit  multiplication  modulo  Ffc.  Computation  of  the 
FNT  requires  multiplications  by  a power  of  2,  which 
implemented  as  bit  shifts  and  subtractions  become  much 
simpler  operations  compared  to  complex  multiplications 
required  in  the  FFT  implementation. 

To  compute  a length  N fast  FNT,  N log2N  additions/ 
subtractions,  and  (N/2)  log2(N/2)  "multiplications"  by  some 
powers  of  2 which  are  implemented  as  bit  shifts  and  sub- 
tractions. To  compute  the  convolution  using  the  FFT,  most 
of  the  time  is  taken  in  computing  the  complex  multiplica- 
tions required  to  compute  the  complex  multiplications 
required  to  compute  the  transforms.  A comparison  with  the 
FNT  reveals  that  these  complex  multiplications  are  replaced 
by  bit  shifts  and  subtractions  which  are  much  faster  opera- 
tions. This  results  in  considerable  computational  savings 
in  the  implementation  of  convolution. 

The  convolution  required  to  multiply  the  two  transforms 
is  about  the  same  for  both  the  implementations. 

To  convolve  long  sequences  using  the  two-dimensional 
FNT  (Appendix  B) , the  computational  effort  increases  by, 
at  most,  a factor  of  2 [4].  Still,  the  FNT  implementation 
of  convolution  is  much  faster  as  compared  to  the  FFT 
implementation. 

Fermat  number  transforms  have  some  additional  advantages 
over  the  FFT.  First,  the  FFT  implementation  requires  storing 
all  the  powers  of  a (see  3.7  ) requiring  a significant  amount 
of  storage  which  may  be  an  important  factor  for  a small 
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minicomputer  or  a special  purpose  hardware  implementation. 
Second,  fixed-point  FFT  implementation  introduces  a signifi- 
cant amount  of  round-off  noise  at  the  output,  6-8  bits 
depending  on  the  data  [39].  This  degrades  the  signal- 
to-noise  ratio  during  the  filtering  operations.  The  FNT 
implementation  is  error  free,  the  only  source  of  error  is 
input  A/D  quantization. 

Timings  for  various  implementations  of  convolution  and 
their  comparison  with  the  FFT  implementation  are  shown  by 
Agarwal  and  Burrus  [4] ; 


N 

FFT 

msec 

FNT 

msec 

1 

32 

16 

3.3 

i 

i 

64 

31 

7.4 

j 

128 

60 

16.6  * 

; 

256 

123 

40.0  ** 

i 

256 

123 

80.0  *** 

(6.3) 

512 

245 

166.0  *** 

< 

1024 

530 

340.0  *** 

. | 

2048 

1260 

720.0  *** 

* using  a = /2 

**  using  2 by  128  convolution 

***  using  the  two-dimensional  FNT 


To  compute  these  timings  it  is  assumed  that  the  transform 
of  the  h sequence  has  been  precalculated.  Thus  timings 
are  for  computing  x transforms,  multiplications  of  the 
transforms,  and  their  inverse  transforms  [4]. 
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For  cyclic  convolution  lengths  up  to  256,  the  FNT 
implementation  of  convolution  have  a factor  of  3 to  5 
speed  advantage  over  the  FFT  implementation.  It  takes 
roughly  13-14  Msec  to  compute  one  butterfly  in  the  compu- 
tation of  fast  Fermat  number  transforms  [4],  Timings  for 
the  computation  of  fast  Fermat  number  transforms,  are  fairly 
well  modeled  by  multiplying  this  time  by  the  number  of 
butterflies  required  in  the  computation.  An  assembler 
subprogram  was  written  to  compute  one  butterfly  for  the  FFT 
algorithm  [4].  The  timing  for  this  was  68  Msec,  and  this 
explains  the  difference  in  the  timings  of  the  FFT  and  FNT. 
Agarwal  claims  that  since  assembler  subprograms  were  not 
optimized  for  time,  it  should  be  possible  to  further  reduce 
their  timings  by  10  to  20  percent.  Also,  to  compute  one 
butterfly  of  the  fast  FNT's,  three  add/subtract  logical 

instructions  are  required  and  since  the  arithmetic  is  done 
32 

mod  2 +1,  these  three  instructions  are  followed  by  three 

branch  instructions  (to  take  into  account  the  carry  bit) . 

32 

If  the  hardware  was  designed  to  do  arithmetic  mod  2 +1, 

these  instructions  could  have  been  avoided  resulting  in  a 
significant  reduction  in  the  computation  time  [4],  One 
objective  of  the  construction  of  FNT  convolver  by  McClellan 
was  to  evaluate  the  total  system  cost  of  an  FNT  convolver 
versus  a pipeline  FFT  convolver,  primarily  in  the  speed 
regime  applicable  to  radar  signal  processing. 

The  signal  bandwidths  encountered  in  radar  signal 
processing  (10-30  MHz)  require  a pipeline  architecture  for 
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either  the  FNT  or  FFT  [40] . Typically,  the  overlap-save 
version  of  high-speed  convolution  is  employed  for  real-time 
processing  of  the  radar  returns  and  a 50  percent  overlap  of 
the  input  data  is  common  [26].  Furthermore,  the  length  of 
the  convolution  to  be  implemented  is  assumed  to  be  large 
(e.g.,  512  or  greater).  Two  cases  will  be  considered: 
a length  1024  convolution  of  real  data  and  a length  1024 
convolution  of  complex  data. 

Four  measures  of  hardware  complexity  are  the  basis  of 
comparison  [26] : 

the  number  of  butterflies  for  output  point, 
the  number  of  reference  spectrum  multiplies  for 

output  point,  i 

the  total  amount  of  interstage  delay  line  memory  in 

the  forward  and  inverse  transforms, 

and  the  total  amount  of  reference  spectrum  memory. 

The  FFT  implementation  will  be  considered  first.  For  either 
real  or  complex  data,  it  is  assumed  that  the  FFT  implemen- 
tation employs  an  11-stage  radix  2 pipeline  FFT  in  both  the 
forward  and  inverse  transform.  Notice  that  for  real  data, 
it  is  possible  to  do  a length  N transform  with  cne  length 
N/2  transform  and  some  overhead  to  combine  the  real  and 
imaginary  parts  [38] . However  the  overhead  amounts  to  an 
additional  butterfly  so  there  is  little,  if  anything  to  gain 
by  using  this  fact  in  a pipeline  FFT  [26] . 

For  the  case  of  a length  2^  = 2048  pipeline  FFT  convolver 
the  number  of  butterflies  per  output  point  is 


163 


2 log2N 


22 


(6.4) 


assuming  50  percent  convolution  overlap.  Likewise,  two 
reference  spectrum  multiplies  must  be  done  per  output  point 
[26] . The  amount  of  interstage  delay  line  memory  can  be 
calculated  from  the  formula  [26] 


I DM  = | (r+1) 


(6.5) 


where  r is  the  radix  of  the  transform. 

Thus,  for  two  radix-2  pipelines,  the  total  is 

I DM  = |(2  + 1)*2  = 3N  = 6144  = 6K 

words  of  memory.  Finally,  the  reference  spectrum  requires 
2K  words  of  memory  [26]  . A pipeline  FNT  structure  is 
identical  to  the  pipeline  FFT  except  in  the  butterfly  where 
rotation  by  ei2lTK/N  (in  the  FFT  case)  is  replaced  by  multi- 
plication by  ^2^. 

Thus  many  of  the  results  quoted  above  are  applicable  to 
the  FNT.  Since  the  FNT  naturally  processes  real  input  data, 
the  cases  of  real  and  complex  convolution  require  different 
realizations.  In  both  cases,  however,  a two-dimensional 
implementation  of  the  convolution  is  required  [311 . 

The  length  1024  convolution  of  real  signals  can  be 
implemented  with  64  x 64  transforms  [26] . 


\ 

1 


1 


i 
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McClellan  claims  that  for  the  FNT  convolver  processing 
real  data,  the  following  results  apply: 

12 

total  number  of  butterflies  = 9 x 2 36  butterflies  per 

output  point 

. . 12 

total  reference  function  multiplies  =2  4 multiplies  per 

output  point 

total  interstage  delay  memeory  = 6-4  K 

amount  of  reference  function  memory  = 4K. 

If  the  signal  to  be  convolved  is  complex  then  one  possible 
implementation  is  to  handle  the  real  and  imaginary  parts  of 
data  separately.  The  number  of  butterflies  per  output 
point,  the  interstage  delay  memory,  and  the  reference  spec- 
trum are  all  doubled.  However,  the  number  of  real  multipliers 
for  the  reference  function  multiply  is  quadrupled  because 
the  multiplications  are  now  complex  [26]  . These  results 
can  be  summarized,  assuming  1024  convolutions,  by 


FFT  Real  or 
Complex  Data 

FNT  Real  • 
Data 

FNT  Complex 

Data 

Butterflies 

22 

36 

72 

Reference 

Multipliers 

2 

2 

16 

Interstage 
Delay  Memory 

6K 

complex  words  a 

6 • 4K 

real  words 

12-8  K 

b real  words 

Reference 

2K 

4K 

8K 

Spectrum 

Memory 

complex  words 

real  words 

real  words 

a One  complex  word  will  contain  approximately  27  bits 
for  typical  high-precision  radar  applications. 

b One  real  word  contains  33  bits  for  FNT. 
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Notice  that  the  FNT  always  requires  more  memory  and  more 
computational  elements  than  the  FFT.  Hardware  savings  are 
possible  because  most  of  the  hardware  cost  of  the  FFT  is 
^ concentrated  in  the  butterfly  elements  (up  to  80  percent) 

[26]  and  because  the  FNT  butterfly  requires  from  one-third 
to  one-sixth  the  hardware  of  an  FFT  butterfly.  These 
remarks  apply  to  the  FNT  where  the  data  to  be  convolved  are 
real,  but  where  the  data  are  complex,  the  situation  becomes 
much  worse  because  all  measures  of  hardware  complexity  are 
increased  by  a factor  of  2 or  4 . 

McClellan  conclude  that  the  FNT  is  a useful  alternative 
to  the  FFT  if  the  signal  to  be  filtered  is  real  and  the 
computational  elements  are  a major  part  of  the  overall 
system  cost  as  in  a pipeline  architecture.  Furthermore, 
for  short  length  convolution  (e.g.,  length  64)  and  for 
two-dimensional  convolution  the  savings  may  be  significant. 
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VII.  CONCLUSIONS 


It  has  been  shown  that,  by  working  in  a finite  field  or 
ring  of  integers  modulo  M,  a large  class  of  transforms 
exist  that  have  the  cyclic  convolution  property,  i.e.,  the 
transform  of  cyclic  convolution  of  two  sequences  is  equal 
to  the  product  of  their  transforms.  These  transforms  are 
called  Number  Theoretic  Transforms  (NTT's)  and  they  are  a 
computationally  efficient  approach  to  performing  the  dis- 
crete convolution  function.  These  NTT's  are  truly  digital 
transforms,  taking  into  account  the  quantization  in  amplitude 
and  the  finite  precision  of  digital  signals.  They  bear  the 
same  relation  to  digital  signals  as  the  DFT  does  to  discrete- 
time or  sample  data  signals  and  the  Fourier  or  Laplace 
transforms  do  to  continuous  time  signals. 

When  working  with  digital  machines,  the  data  are  avail- 
able only  with  some  finite  precision,  and  therefore,  without 
loss  of  generality,  the  data  can  be  considered  to  be  inte- 
gers with  some  upper  bound.  To  compute  convolution  in  this 
digital  domain,  operations  in  the  complex  number  field  of 
the  continuous  domain  can  be  imitated  in  a finite  field,  or 
more  generally,  in  a finite  ring  of  integers  under  additions 
and  multiplications  modulo  some  integer  M,  with  an  integer 
a of  order  N,  replacing  e J(2tt)/n  t^e  Discrete  Fourier 
Transform  (DFT) . 

In  this  ring,  when  two  integer  sequences  x(n)  and  h(n) 
are  convolved,  the  output  integer  sequence  y(n)  is  congruent 
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to  the  conventional  convolution  of  x(n)  and  h(n),  modulo  M. 

In  the  ring  of  integers  modulo  M,  conventional  integers 
can  be,  unambiguously,  represented  if  their  absolute  value 
is  less  than  M/2.  If  the  input  integer  sequence  x(n)  and 
the  filter  sequence  h(n)  are  so  scaled  that  |y(n) | never 
exceeds  M/2,  one  would  get  the  same  results  by  implementing 
convolution  in  the  ring  of  integers  mod  M as  that  obtained 
with  normal  arithmetic. 

By  special  choices  of  the  length  N,  the  mod  M,  and 
the  value  a,  it  is  possible  to  have  transforms  that  need 
only  word  shifts  and  additions  but  no  multiplications, 
that  have  an  FFT  type  algorithm,  that  do  not  require  storage 
of  complex  values  of  a and  that  have  no  round-off  errors. 

It  has  been  shown  that  Mersenne  transforms  with 
M = p = 2q-l,  q a prime,  and  a = -2,  have  the  transform 
length  equal  to  N = 2q  and  therefore  do  not  have  an  FFT 
type  fast  computational  algorithm. 

The  best  known  number  theoretic  transform  is  the 

2^ 

Fermat  Number  Transform  (FNT) , where  M = 2 + 1,  t a positive 

integer.  For  FNT's  with  a prime  or  composite  modulus  it 
was  verified  that  a = 2 or  a power  of  2 is  possible,  for 
sequences  up  to  N = 2t+^.  This  is  a very  desirable  situa- 
tion since  N is  highly  composite  allowing  an  FFT  type 
algorithm  and  all  multiplications  by  powers  of  a are  simple 

word  shifts.  If  a = /2  is  used  then  sequences  of  length 
t+2 

N = 2 are  possible,  thus  increasing  the  maximum  sequence 
length  permissible. 
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Assembler  programs  on  the  IBM  370/155  computer,  written 
by  Agarwal,  showed  that  for  cyclic  convolutions  length 
up  to  256,  the  FNT  implementations  of  convolution  have  a 
factor  of  3 to  5 speed  advantage  over  the  FFT  implementa- 
tions. The  reasons  for  the  speed  up  are: 

1 - The  Fermat  number  transform  requires  no  multiplica- 

tions and,  therefore,  the  implementation  of 
convolution  requires  only  N multiplications  for 
an  N point  convolution.  The  number  of  additions 
and  subtractions  (together)  for  a convolution  is 
2N  log2N  and  there  are  N log2  N required  "multipli- 
cations" by  a power  of  two. 

2 - Only  real  operations  are  required.  This  buys  about 

two  to  one  savings  over  the  FFT  requirements. 

3 - The  Fermat  number  transform  is  able  to  compute  an 

exact  convolution  thus  allowing  a program  to  avoid 
the  need  for  either  floating  point  arithmetic  or 
overflow  checks  or  other  precautions. 

The  computation  required  to  multiply  the  two  transforms  is 
about  the  same  for  both  implementations.  To  convolve  long 
sequences  using  two  dimensional  FNT,  the  computational  effort 
increases  by,  at  most,  a factor  of  2.  Still  the  FNT  imple- 
mentation of  convolution  is  much  faster  as  compared  to  the 
FFT  implementation. 

Fermat  number  transforms  have  some  additional  advantages 
over  the  FFT.  First,  the  FFT  implementation  requires  storing 
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all  powers  of  a requiring  a significant  amount  of  storage 
which  may  be  an  important  factor  for  a small  minicomputer 
or  a special  purpose  hardware  implementation. 

Second,  fixed-point  FFT  implementation  introduces  a 
significant  amount  of  round-off  noise  at  the  output  6-8 
bits  depending  on  the  data  [38] . This  degrades  the  signal- 
to-noise  ratio  during  the  filtering  operations.  The  FNT 
is  error  free,  the  only  source  of  error  is  input  A/D 
quantization. 

In  the  realm  of  radar  signal  processing  the  potential 
for  higher  throughput  is  worth  exploring.  For  this  reason 
McClellan  designed  and  constructed  a small  prototype  FNT- 
convolver.  An  important  element  in  the  design  of  the  FNT- 
convolver  was  a new  coding  scheme  for  the  data,  although 
simpler  codes  are  possible.  The  experience  derived  from 
designing  and  building  this  hardware  serves  as  the  basis  for 
estimates  of  the  site  of  large  pipeline  FNT  convolvers,  for 
use  in  radar  matched  filtering  applications.  The  result  of 
the  hardware  comparison  of  the  FNT  versus  a pipeline  FFT, 
indicates  that  the  anticipated  savings  of  the  FNT  can  be 
realized  for  small  systems  (e.g. , length  64  convolution), 
when  the  signal  to  be  filtered  is  real.  However,  in  larger 
systems  where  one  must  use  two-dimensional  Convolution  to 
implement  one  dimensional  convolution,  the  savings  in 
multiplier  hardware  are  offset  by  increased  transform  size 
and  the  corresponding  increase  in  memory  size  and  reference 
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spectrum  multiplier  hardware.  In  this  case,  when  the  signal 
to  be  filtered  is  real,  the  FNT  still  offers  a potential 
savings  in  the  amount  of  hardware  versus  the  FFT.  When  the 
signal  is  complex  valued  the  amount  of  FNT  hardware  approxi- 
mately doubles  and  is  much  greater  than  the  pipeline  FFT. 

In  the  implementation  of  two-dimensional  convolution  there 
is  no  penalty  due  to  increased  memory  size  and  the  savings 
in  multiplier  hardware  will  translate  into  savings  for  the 
overall  convolver  system. 

It  was  shown  that  the  main  drawbacks  of'  FNT's  is  a 
rigid  relationship  between  word  length  and  the  obtainable 
transform  length  and  a limited  choice  of  possible  word  lengths. 
This  last  point  is  especially  significant,  since  FNT's  are 
restricted  to  word  sizes  equal  to  q = 2fc,  t an  integer.  As 
q increases  very  rapidly  with t , the  choice  of  possible  word 
lengths  is  very  limited,  and  most  practical  digital  filtering 
applications,  when  implemented  with  FNT,  are  constrained  to 
word  lengths  of  16,  32  or  64  bits.  If  the  dynamic  range 
required  for  convolution  does  not  correspond  to  q = 2t 
choosing  the  next  higher  value  of  q may  result  in  a signifi- 
cant waste  of  computing  power. 

Various  solutions  to  these  problems  involving  either 
two-dimensional  techniques,  the  use  of  "the  Chinese  Remainder 
Theroem",  or  other  NTT's  has  been  discussed. 

Along  these  lines,  it  was  shown  that  transforms  over 

2 

the  Galois  Field  GF(p  ),  can  be  found  which  do  not  introduce 
round-off  errors  and  which  can  be  used  to  compute 
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information-lossless  convolutions  of  sequences  of  complex 
numbers,  by  a FFT  type  algorithm.  A disadvantage  of  this 
transform  is  that  multiplications  by  powers  of  the  primitive 
element  (a)  is  not  as  simple  as  in  Mersenne  or  FNT's. 

It  was  verified  that  a Fourier-like  transform  in  GF(p) 
where  p is  a prime  of  the  form  An  = 3x2n+l,  n a positive 
integer,  is  possible.  This  transform  increases  the  variety 
of  methods  and  the  digital  word  lengths  that  can  be  used 
for  computing  the  convolution  of  integers  beyond  the  above 
mentioned  Fermat  or  Mersenne  Number  Transforms.  Also,  a 
special  NTT  that  can  be  computed  using  a high-radix  fast 
Fourier  type  algorithm,  defined  on  arithmetic  modulo  primes 
of  the  form  (2n-l)2n+l,  was  discussed. 

Complex  Mersenne  transforms  that  can  be  computed  without 
multiplications  were  presented.  These  transforms  are  very 
promising  for  computing  convolutions  because  they  can  be 
partly  computed  with  the  FFT  type  algorithms  and  some  of 
the  operations  can  be  performed  on  words  of  reduced  length. 
Complex  Mersenne  transforms  also  have  the  advantage  of 
permitting  operations  on  transform  lengths  and  convolution 
lengths  up  to  four  times  longer  than  is  possible  with 
conventional  Mersenne  transforms. 

These  results  have  been  extended  to  cover  the  case  of 
transforms  oerating  in  a ring  modulo  a pseudo  Mersenne 
number  or  submultiple  of  such  a number.  It  was  verified  that 
some  of  these  transforms  have  a highly  composite  transform 
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length  and  therefore  can  be  computed  with  an  efficient 
FFT-type  algorithm.  In  the  same  way  pseudo  FNT's  defined 
in  a ring  which  is  a submultiple  of  a Fermat  number  and 
can  be  considered  as  a generalization  of  FNT's.  allow  a much 
wider  choice  of  possible  word  lengths,  and  therefore  are 
well  adapted  for  evaluating  convolutions.  As  an  extension 
the  case  of  complex  transforms  was  considered. 

Finally,  complex  pseudo  FNT's  were  presented,  that 
allow  a length  double  that  of  FNT's,  and  part  of  the  calcu- 
lations to  be  performed  on  words  of  reduced  length.  Some 
of  these  transforms  have  a highly  composite  number  of  terms 
and  are  therefore  well  suited  for  computing  complex  convo- 
lutions with  an  efficient  FFT-type  algorithm. 

Recently  [32]  a number  of  three  bit  primes  have  been 
discovered  which  make  possible  very  efficient  fast  number 
transforms  approaching  that  of  the  FNT,  but  permitting  much 
larger  transform  lengths  suitable  for  convolving  the  large 
arrays  met  in  picture  processing  and  electron  microscopy  in 
particular.  If  a method  of  implementing  arithmetic  modulo 
three  bit  primes  comparable  to  that  already  developed  for 
implementing  arithmetic  modulo  a Fermat  number,  could  be 
found,  very  efficient  fast  convolution  would  become  possible 
for  a very  large  range  of  array  sizes. 

Rader  and  Brenner  [41]  have  introduced  an  alternative 
form  for  the  FFT.  This  new  form  has  the  advantage  that  none 
of  the  multipliers  is  complex,  but  in  the  usual  complex 
field,  most  are  pure  imaginary.  It  has  the  disadvantage 
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that  one  must  divide  by  very  small  numbers  and  therefore 
aggravate  any  quantization  noise  problems.  This  new 
algorithm  may  be  applied  to  the  complex  NTT  without  this 
disadvantage . 

Winograd  [42]  has  shown  how  to  perform  a short-length 

D.F.T.  of  length  N = p or  p (where  p is  a prime)  in  a very 

efficient  way.  Winograd' s algorithm  uses  the  fact  that 
N 

a =1  and  requires  that  all  operations  be  performed  in  a 
field.  Hence,  providing  one  chooses  the  modulus  M to  be  a 
prime  number,  one  may  perform  the  NTT  by  using  Winograd' s 
algorithm. 

A final  remark  is  in  order.  Whether  or  not  an  engineering 
method  is  useful  becomes  clear  only  when  that  method  is 
evaluated  by  the  wide  community  of  potential  users,  who 
consider  it  in  relation  to  their  needs.  Since  so  many 
engineers  and  programmers  are  not  familiar  with  number 
theory  it  is  an  open  question  whether  NTT's  algorithms  will 
ultimately  prove  to  be  of  great  or  small  importance. 
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APPENDIX  A 


TWO  DIMENSIONAL  CONVOLUTION  FOR  CONVOLVING  LONG  SEQUENCES 

Arithmetic  mod  F (a  Fermat  number)  can  be  implemented 
using  b = 2t  bit  representation  of  integers.  In  Section  IV, 
it  has  been  shown  that  the  maximum  length  of  sequences  which 
can  be  cycled  convolved  using  the  FNT  with  a = 2 is  N = 2b 
and  therefore  the  length  of  sequences  which  can  be  convolved 
is  proportional  to  the  word  length  in  bits.  Thus  for  long 
sequences  the  word  length  requirement  may  be  excessive. 

Rader  [3]  suggested  using  a two  dimensional  convolution 
scheme  to  convolve  long  one-dimensional  sequences  and  Agarwal 
and  Burrus  [17,18]  presented  such  a two  dimensional  convolu- 
tion scheme.  Using  this  scheme,  cyclic  convolution  of 
length  N = LP  is  implemented  as  a two  dimensional  cyclic 
convolution  of  length  2LxP. 

This  two-dimensional  cyclic  convolution  can  be  implemented 

using  a two  dimensional  FNT.  Using  this  two  dimensional 

scheme,  the  word  length  required  is  proportional  to  the 

square  root  of  the  length  of  the  sequences  to  be  convolved 

2 

which  would  give  for  a maximum  sequence  length  8b  rather 
than  4b.  I.e.,  for  a computer  word  length  b = 64 

N for  a = /2  would  be  327681 

In  the  following  pages  an  example  illustrative  of  the  two 
dimensional  convolution  using  arithmetic  modulo  a Fermat 
number  and  FNT's  is  worked  out. 
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Let  x (n)  and  h(n),  n—  0,  1,  ...  N— 1,  be  two  sequences 
which  need  to  be  cyclically  convolved.  Let  N = LP. 

We  construct  two  (2lxP)  two-dimensional  sequences 
h(i,j)  and  x(K,£)  from  x(n)  and  h(n)  respectively  as 
shown  below. 


h(i,j)  = h ( j L + i - L)  (1) 

i = 0,1,  . . . , 2L-1 
j 0,1,  ...,  p— 1. 

And 


xUL+K) 

X(K,£)  = 


0 


K = 0,1,  . . . , L-l 


(2) 


K = L, L+l , . . . , 2L-1 

£=0,1,  . . . , p-1 


Example : 

Two  dimensional  convolution  using  FNT's. 

Let 

x (n)  = (2,-2, 1,0)  and  h(n)  = (1,2, 0,0) 

Using  Fermat  number  transforms  modulo  F = 17, 

x(n)  = (2,15,1,0)  and  h(n)  = (1,2, 0,0) 
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Note  that  the  second  element  is  changed  from  -2  to  15. 
Notice  that  N=4  = LxP  = 2x2. 

One  constructs  two  (2Lxp)  two  dimensional  sequences 
h(i,j)  and  x(K,£)  from  x(n)  and  h(n),  according  to  (1) 
and  (2)  . 

It  turns  out 


0 

1 

2 

1 

0 

2 

15 

0 

1 

0 

x(K,A)  = 

0 

0 

2 

0 

0 

0 

Now  taking  two  dimensional  transforms  of  h and  x yields  H 
and  X. 


p-1  2L-1 

H(m,n)  = l l h(i,j)a2^m  a^n  (3) 

j=0  i=0 


0,1, 

. . . , 2L-1 

x2L  - is 

an  element 

of 

order 

2L 

0,1, 

H 

i 

a 

• 

• 

• 

aD  - is 

an  element 

of 

order 

P 

and  a similar  expression  for  x(m,n). 

Remembering  that  one  is  using  = 17  as  the  modulus 


a_T  = a.  = 13  and  a = a_ 

2L  4 p 2 


16 
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This  can  be  found  by  noting  that  3 and  6 primitive  roots 
of  the  ring  in  consideration  will  generate  all  the  positive 
integers  in  the  ring  (2.16). 

Also,  by  (2.20),  if  a is  a root  of  order  N then 

aK  is  of  order  N/K  if  K|N  (K  divides  N) 
a is  of  order  N if  N and  K are  relatively  prime. 

Thus,  with  these  considerations  in  mind  one  finds 

a2L  = a4  = 316/4  = 34  = 13  (mod  17) 

a = 13  order  4 


ap  = a2  = 316/2  = 3 8 '=  16  (mod  17) 

a = 16  order  2 

Now  one  can  return  to  the  calculation  of  the  two  dimensional 
transforms.  The  first  term  will  be 


+ h (1, 0) 

+ h (2 , 0) 

+ h (3, 0) 


13(0) (0) 
13 (1) (0) 
13  (2)  (0) 
13 (3) (0) 


i6(°) (0) 
16 (0) (0) 
16 (0) (0) 
16(0) (0) 


+ h (0 , 1) 

+ h (1 , 1) 

+ h (2 , 1) 

+ h (3 , 1) 


13 (0)  (0)  16  (1)  (0) 
13  (D  (0)  16(1)  (0) 
13 (2)  (0)  16(1)  (0) 
13 (3) (0)  16(1)(0)§ 
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i . e. , 

H (0 , 0 ) = h (0 , 0)  + h (0 , 1)  + = 0 + l+  = 6 (mod  17) 

+ h (1, 0)  + h (1, 1)  + 0 + 2 + 

+ h (2 , 0)  + h(2,l)  + 1 + 0 + 

+ h (3 , 0)  + h ( 3,,  1 ) 2 + 0 

Now  constructing  a table  that  will  help  in  the  evaluation  of 
M(m,n) . 

N = 0 1 2 3 4 5 6 7 8 9 


notice  order  2 
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H ( 1 , 0 ) = 1 13 (2)  16(0)  + 1 13 (0)  16  (0)  + = 16  + 1 = 0 mod  17 

2 13(3)  16(0)  + 2 131  16(0)  + 8 +26 


H (2 , 0)  = h(2,0)'  13 (2)  (2)  16(0)(0)  +h(0,l)  13{0)  (2)  16  (1)  (0) 


+ h (3 , 0)  i3 (2)  16(0)(0)  + h{1/1)  i3(2)(2)  16(1)(0) 


= 1 13(4)  16(0)  + 1 13(0)  16(0)  = 1 + 1 + = 


15  (mod  17) 


+ 2 13(6)  16(0)  + 2 13(2)  16(0)  +32  + 


H (3 , 0)  = h (2 , 0)  i3(2^3)  16  (°)  (0)  +h(0fl)  13  (0)  (3)  16  d)  (0) 


+ h (3 , 0)  13{2)(3)  16(4)(0)  +h(l,l)  13(1)(3)  16(1)(0) 


= 1 13(6)  16(0)  + 1 13°  16°  = 16+1  = 


0 (mod  17) 


+ 2 139  16°  + 2 133  16°  + 26+8 


For  the  second  column. 
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H(0,1)  = h ( 2 , 0 ) 13(2)(0)  16(0)(1)  +h(0,l)  13(0)(0)  16(1)(1) 
+ h (3 , 0)  13(3)(0)  16(0)(1)  + h(l,l)  13(1)(0)  16(1)(1) 


= 1 + 16  = 0 (mod  17) 

+ 2 + 32 


H ( 1 / 1 ) 

= 

1 13(2)(1> 

16(0)(1)  + 

(1)  13(0)(1)  16(1)(1) 

+ 

2 13(3)(1) 

16(0)(1)  + 

(2)  13  ^1)  (1^  16  (1)  ^1) 

= 

16+16  = 

14  (mod  17) 

+ 

8 + 416 

• 

H(2,l) 

= 

(1)  13(2)(2)  16(0)(1) 

+ 1 13(0)(2)  16(1)(1) 

+ 

2 13(3)(2) 

16<0)(1)  + 

2 13(1)(2)  16(1)(1) 

= 

1 + 16  = 

0 (mod  17) 

+ 

32  + 512 

H (3 , 1) 

x 13 (2)  (3) 

16{0)(1>  + 

i 13(0)(3)  16(1)(1)  = 16  + 16  =16 

(mod  17) 

+ 

2 13(3)(3) 

16(0)(1)  + 

2 13  (1)  (3)  16(1)(1)  +26  + 128 
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i.e. , 


H(m,n)  = 

two  dimensional 
Fermat  transform 


6 0 
0 14 

15  0 

0 16 


The  two  dimensional  transform  of 


2 1 

15  0 

0 0 

0 0 

will  be  given  by 

p-1  2L-1 

' X (m,n)  = l l x(i, j)  a2lm 

j=0  i=0 

So 


X(0,0)  -5(0,0)  13(0)(0)  16(0)(0)  +5(0,1)  13(0)(0)  16(1)(0) 


+ x(l,0)  13  (1)  (0)  16  (0)  (0) 


= 2+1+15  = 1 (mod  17) 
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x ( 1 , 0 ) = x(0,0)  13(0)(0)  16(0)(0)  + x(0,l)  13(0)  (1)  16(1)(0) 


x(2,0) 


x(3,0) 

| 


x (0 , 1) 


x(l,l) 


+ x (1, 0)  13 


(1)  (1)  16 


(0)  (0) 


= 2 + 1 + 15(13)  = 11  (mod  17 


= x (0 , 0)  13(0)(2)  16()0(0)  + x (0 , 1)  13(0)(2)  16(1)(0) 


+ x (1 , 0)  13(1)(2)  16(0)(0) 


= 2 + 1 + (15)  (16)  = 6 (mod  17) 


= 2 13(0)(3)  16(0)(0)  + (1)  13(0)(3)  16(1)(0) 


+ 15  i3  t!)  (3)  16(0)(0) 


= 2 + 1 + (15)4  = 12  (mod  17) 

= 2 13(0><°>  16(0)(1)  + (1)  13(0)(0)  16(1)(1) 

+ 15  13(1)(0)  16(0)(1) 

= 2+16+15  16  (mod  17) 

= (2)  13(0)(1)  16(0)(1)  + (1)  13(0>(1)  16(1)(1) 
+ (15)  13(1)(1)  16(0)(1) 

= 2+16+8  = 9 (mod  17) 
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x ( 2 , 1) 


= (2)  13(0)(2)  16(0)(1)  + (1)  13(0)(2) 
+ (15)  13  (1)  (2)  16  (0)  (1) 


16(1)(1) 


J 

= 2 + 16  + (15)  (16)  = 3 (mod  17) 


x ( 3 , 1)  = (2)  13(0)(3)  16(0)(1)  + (1)  13(0)(3)  16(1)(1) 


+ (15)  13(1)(3)  16(0)(1> 


= 2 + 16  + (15) (4)  = 10  (mod  17) 

i 

So 


X (m,n) 

two  dimensional 
Fermat  transform 


1 16 

11  9 

5 3 

12  10 


Let  y(i,j)  be  two-dimensional  cyclic  convolution  of 
x and  h sequences,  then  it  can  be  proved  that  two  dimensional 
transform  of  y is  the  product  of  H(m,n)  and  X(m,n)  [17], 
defined  by 


p-1  2L-1~ 

y(ifj)  = 2¥h  % % H (m»n) x (m,n)  ot2^ml  a~3n 

n=0  m=0 

1 = 0 1 1 1 • • • f 2L-1 
j = 0,1,  . . . , p-1 
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Coming  back  to  the  example : 


y(0,0)  = H(0,0)X(0,0)13_(0)  (0)16~(0)  (0)+H(0,l)X(0,l)13-(0)  (0)16 
+ H(1,0)X(1,0)13_(0) {1)16~(0) (0)+H(l,l)X(l,l) 13“ (0) (1)16 
+ H(2,0)X(2,0) 13~ (0) {2)16-(0) (0 ) +H (2 , 1) X (2 , 1) 13_ (0 } (1) 16 
+ H(3,0)X(3,0)13~(0) (3)16~(?) (0) +H  (3 , 1)  X ( 3 , 1)  13"  (0) (0)16 

i.e. , 

y(0f0)  = (6)  (1)  (1)  (1)  + (14)  (4)  (1)  (1)  + = 10  (mod  17) 

+ (15)  (5)  (1)  (1)  + (16)  (10)  (1)  (1) 

y (1/ 0)  = (6)  (1) 13"  (1)  (0) 16' (0)  (0)  + (14)  (9)13~(1)  (1)16-(0)  (1) 

+ (15)  (5)13_(2)  (1)16"(0)  (0)+  (16)  (10)13'(1)  (3)16'(0)  (1) 

= 6+126  13"1  + 75  13~2  + 160  13~3 

= 16  (mod  17) 

Notice  that  in  this  last  calculation  use  of  the  following 
table  has  been  made: 

13"1  = ? 13  x 13_1  = 1 (mod  17)  = 1 13_1  = 4 


(0) (0) 

(0) (1) 

(0)  (1) 

(0) (1) 
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So 


13-1 

= 

4 

also 

16_1 

= 

16 

13'2 

= 

42 

= 

16 

16-2 

= 

1 

13'3 

= 

43 

= 

13 

16~3 

= 

16 

13‘4 

= 

44 

= 

1 

16~4 

= 

1 

13-5 

= 

45 

= 

4 order  4 

IS'5 

= 

16 

13-6 

= 

46 

= 

16 

16~6 

= 

1 

13-7 

= 

47 

= 

13 

16'7 

= 

16 

13-8 

= 

4 8 

= 

1 

16’8 

= 

1 

13'9 



49 

_ 

4 

16’9 



16 

Continuing  with  the  example: 

y (2 , 0)  - (6)  (1)  13-(2)(0)l6-(0)(°)  + (14)  (9)  13-(2)(D16-(0)(1) 

+ (15)  (5)  13"<2)  (2)16-(0)  (0)  + (16)  (10)13_(2)  (3)16_(0)  {1) 

= 16  (mod  17) 

y (3, 0)  = (6)  (1)  13-(3)(0)16-(0)(0)  + (14)  (9)  13~ (3) (1) lfi- (0) (1) 

+ (15) (5)  13-<3)(2)16-(0)(0)  + (16)(10)l3-(3)(3)16-(0)(l) 

= 16  (mod  17) 
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and  for  the  second  column, 

y (0 , 1)  = (6)  (1)  13*  (°)  (0)  16~  (1)  (0)  + (14)  (9)  13_  (0)  (1)  16_  (1)  (1) 

+ (15)  (5)  13_(0)  (2)16"(1)  (0)  + (16)  (10)13_(0)  (3)16_(1)  (1) 

= 16  (mod  17) 

y (1,1)  = (6)  (1)  13-(1)  (0)16-(1)  (0)  + (14)  (9)  13_(1)  (1)16_(1)  (1) 

+ (15)  (5)  13-(1)  (2)16-(1)  (0)  + (16)  (10)13_(1)  (3)16_(1)  {1) 

= 16  (mod  17) 

y (2 , 1)  = (6)  (1)  i3~(2)  (°)  i6~  (°)  + (14)  (9)  i3“(2)  (4)i6-(1)  (1) 

+ (15)  (5)13-(2)  (2)16~(1)  (0)  + (16)  (10)13-(2)  (3) 16_(1)  (1) 

= 10  (mod  17) 

y (3, 1)  = (6)  (1)  13_(3)  (0)16“(1)  (0)  + (14)  (9)  13~ {3)  (1) 16~ (1)  (1) 

+ (15)  (5)13-(3)  (2)16~(1)  (0)  + (16)  (10) 13“  (3)  (3) 16~  (1)  (1) 

= 16  (mod  17) 

Finally, 
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10 


16 


16 
10 
16 

and  since 

2^;  = IN  = (2N)"1  = (8_1)  = 15  (mod  17) 

we  have 

14  2 

2 2 

2 14 

2 2 

And,  since  it  can  be  proved  that  the  relationship  between 
two  dimensional  convolution  and  one  dimensional  convolution 
is  given  by  [18] : 

y (i+L, j ) = y ( jL+i) 

where 

i ~ 0,1,  • • • , L 1 
j = 0,1,  . . . , p-1. 


(mod  17) 


10 

16 

16 

16 

y (i , j ) = 15 

= 

16 

10 

16 

16 
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In  this  example : 


© 

i = 

0, 

u. 

II 

o 

y (i+L,i) 

y (0+2 , 0)  = y (2 , 0) 

y (jL+i) 

y (0  • 2+0)  = y (0) 

i.e.  y (2 , 0)  = y(0) 

= 2 

© 

i = 

1# 

j-  0 

y (1+2 , 0)  = y ( 3, 0) 

y (0  • 2+1)  = y (1) 

i.e.  y (3,0)  = y (1) 

= 2 

© 

i = 

0, 

j = 1 

y (0+2 , 1)  = y (2 , 1) 

y (1 • 2+0)  = y (2) 

i.e.  y (2 , 1)  = Y(2) 

= 14 

© 

i = 

1, 

j = 1 

y (1+2,1)  = y (3 , 1) 

y (1  • 2+1)  = y (3) 

• 

i.e.  y ( 3 , 1)  = y(3) 

= 2 

Note  that  for  the  original  sequences 

x(n)  = (0,1,15,2)  and  h(n)  = (1,2, 0,0) 

y(n)  = x(n)  * h(n)  = 2,2,14,2. 

Exactly  the  same  result!! 

In  taking  two  dimensional  transforms,  p and  2L  are 
restricted  to  be  a power  of  2_  and  also  p <_  4b  and  2L  <_  4b 
(b  = 2t  bit  representation  of  integers  in  arithmetic  modulo 
Ffc  Fermat  number)  in  the  example: 
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Since  Ffc  = 2b  + 1 = 17  b = 4 t = 2 , 

also  p = 2 , L = 2 . 

Notice  that  N = PL  <_  8b2. 

Thus  the  length  of  the  sequences  that  can  be  convolved 
using  two  dimensional  convolution  scheme  is  proportional 
to  the  square  of  the  number  of  bits  used  in  the  word  length. 

The  order  in  which  two  dimensional  transforms  or  inverse 
transforms  is  taken  is  reversible.  But  there  is  some  compu- 
tational advantage  in  taking  transform  first  along  the 
direction  2 (length  p)  and  then  along  the  direction  1 (length 
2L) , because  half  the  x sequences  along  direction  2 (p)  are 
zero  and  half  the  n sequences  along  direction  2 are  cyclic 
shifts  by  one  position  of  the  other  half  n sequences  [18]  .’ 

Also  while  taking  the  inverse  transform,  computationally 
it  is  advantageous  to  first  take  the  inverse  transform 
along  direction  1,  then  along  direction  2 [18].  Because 
we  need  only  half  the  y sequences  along  direction  2, 
therefore  after  taking  inverse  transforms  along  direction 
1,  we  can  throw  away  half  the  terms. 
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APPENDIX  B 


BASIC  PROPERTIES  OF  QUADRATIC  RESIDUES 


x = a (mod  p) 


(B . 1) 


be  a congruence,  where  p is  any  odd  prime  and  a is  any 
integer.  If  a = 0 (mod  p) , then  the  only  solution  to  (B.l) 
is  x = 0 (mod  p) . Therefore  one  assumes  p | a.  For  some 
values  of  a,  (B.l)  will  have  a solution,  whereas  for  some 
other  values  of  a,  (B.l)  will  have  no  solution. 


Definition  B.l;  Let  £ be  a prime,  and  let  a be  any 
integer  such  that  p J(  a.  One  says  that  a is  a quadratic 
residue  modulo  p provided  that 


2 

x = a (mod  p) 


(B.  2) 


has  a solution.  Otherwise,  one  says  that  a is  a quadratic 
nonresidue  modulo  p. 

Suppose  that  p is  given  and  consider  the  problem  of 

determining  all  quadratic  residues  modulo  p.  If  a is  a 

2 

quadratic  residue  modulo  p,  then  p J(  a and  a = x (mod  p) 
for  some  x.  However,  since  any  integer  is  congruent  to  one 
of  0,1,  ...,  p-1 (mod  p) , one  sees  that  a must  be  congruent 
to  one  of 
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UNCLASSIFIED 
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NL 


3*3 

*8'- 3 1 1 


(B.  3) 


l2,  22 , . ..,  (p-1)2  (mod  p) 

If  p is  not  too  large,  then  this  procedure  can  actually  be 
used  for  computation. 

Example:  Let  p = 13 

Then  a is  a quadratic  residue  modulo  13  if  and  only  if  a 
is  congruent  to  one  of 


l2,  22 , 122  (mod  13)  ; 

that  is  a is  a quadratic  residue  modulo  13  if  and  only  if 
a = 1,4,9,3,12,10,10,12,3,9,4,1  (mod  13). 

Thus  the  quadratic  residues  of  13  are 

1,  3,  4,  9,  10,  12. 

Hence  the  quadratic  nonresidues  modulo  13  are 

2,  5,  6,  7,  8,  11. 


Notice  that  the  initial  list  of  quadratic  residues 
obtained  is  symmetric,  with  each  element  of  the  list 
appearing  exactly  twice.  This  is  a general  phenomenon. 
Indeed,  one  has 


p-x  = ~x  (mod  p) 


(B.4) 
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so  that 


(p-x) 2 


(-x)  mod  p 


(B.  5) 


and  thus 

(p-x)2  = x2  (mod  p)  (B.6) 

therefore,  if  a is  a quadratic  residue  modulo  p,  then  a 
is  congruent  to  one  of 

l2,  22,  (E^i)2  mod  p (B. 7) 

J 

No  two  integers  of  (B.7)  are  congruent  modulo  p.  Hence, 
among  the  integers 

1,  2,  . • . , p— 1 

precisely  (p-l)/2  are  quadratic  residues  modulo  p and 
precisely  (p-l)/2  are  quadratic  nonresidues  modulo  p.  This 
can  be  verified  in  the  above  example.  One  finds  the 
following  notation  very  convenient: 

Definition  B.2: 

Let  p be  an  odd  prime  and  a an  integer  such  that  p / a. 

The  Legendre  symbol  is  defined  as  follows: 
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# - < 


+1  if  a is  a quadratic  residue  modulo  p 


-1  if  a is  a quadratic  nonresidue  modulo  p 


(B. 


The  Legendre  symbol  (^)  should  not  be  confused  with  the 
fraction  a/p. 


Example : 


Let  p 

•1  , 


(rr) 


(rr) 


1 ' (TI) 

-i 


i , 


See  above  example,  where  it  is  listed  the  quadratic  residues 
modulo  13,  and  quadratic  nonresidues  mod  13.  Moreover, 
since 


18  = 5 (mod  13) 

and  5 is  a quadratic  nonresidue  modulo  13.  So  is  18,  and 
thus 


,18» 

(TJ) 


-1 


Properties  of  Legendre  symbol. 

Let  p be  an  odd  prime  end  let  a and  b be  integers  such 
that  p J a and  p J(  b.  Then  the  following  results  hold 
1 20]  l 
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(i)  <^-> 


(ii)  (i) 


(B.9) 


(iii)  If  a = b mod  p then  (— ) = (— ) 

P P 


Proof : 


2 2 

(i)  The  congruence  x = a mod  p has  as  a 
solution  x = a. 


(ii)  Set  a - 1 in  result  (i) . 


(iii)  If  a e b(mod  p) , then  the  solutions  x^  = a (mod  p) 

2 

are  the  same  as  the  solutions  of  x = b (mod  p) . 
Therefore,  tne  first  congruence  has  solutions 
if  and  only  if  the  second  does.  Thus, 


The  properties  of  the  Legendre  symbol  given  above  are  very 
elementary.  However  a property  of  the  symbol  which  is  by 
no  means  obvious  is  the  following  result: 


Theorem  8.3:  (Euler's  Criterion): 

Let  £ be  an  odd  prime  and  let  a be  an  integer  such  that 
p / a.  Then 
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(|)  = a(p’1)/2  (mod  p) 


(B. 10) 


Proof  s 


By  Fermat's  theorem  (2.14 J,  one  has 


(p-D/2,2  _ _P“1  = 


ar  =1  (mod  p) 


Thus,  if 


then 


and  so 


Therefore 


h - a^1*/2 


h = 1 mod  p 


p|  (h-1)  (h+1) 


P|h-1, 


p|h+l# 
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and  hence 


h = a^P-1)/2  = ± i (mod  p) . 

Now  if  p is  odd,  so  (^)  - +1  if  and  only  if  a(P-1)/2  = i (mod  p) . 

Consequently,  (^)  • ± 1 if  and  only  if 

P % 

a (P“D  /2  = + i (rood  p) 

respectively. 

Corollary  B.4: 

Let  p be  an  odd  prime,  and  let  a and  b be  integers  such 
that  p | a,  p )f  b.  Then 

<f>  - <f>  <£>  ' <B.u> 


Proof  s 

By  Euler's  Criterion 

if)  - (ab)  (p-1,/2  - ,IP-l)/2b(P-1)/2  = (|)  <|>  mod  p. 

It  is  an  immediate  consequence  of  Corollary  (B.4) 

that 

(i)  the  product  of  two  quadratic  residue  modulo  p is 
a quadratic  residue  modulo  p. 

(ii)  the  product  of  two  quadratic  nonresidue  modulo  p 
is  a quadratic  residue  modulo  p, 


and 


(iii)  the  product  of  a quadratic  residue  and  a 

quadratic  nonresidue  is  a quadratic  nonresidue. 

Example:  Let  p ■ 13 

By  previous  calculations 

3 and  12  are  quadratic  residues  mod  13 

and 

2 

3*12  * 36  ■ 6 (mod  13)  is  a quadratic  residue. 

However 

2 and  5 are  quadratic  nonresidues  mod  13 

and 

2*5  ■ 10  ■ 62  (mod  13)  is  a quadratic  residue  mod  13. 
Finally 

7 is  a quadratic  nonresidue  mod  13 
10  is  a quadratic  residue  rood  13 
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and 


7*10  = 70  5 (mod  13)  is  a nonresidue. 


Another  consequence  of  Euler's  criterion  is  the  following: 
Corollary  8.5: 

Let  p be  an  odd  prime.  Then 


<_!)  (P-D/2 


In  other  words, 

f +1  if  p = 1 (mod  4) 


[ -1  if  p = 3 (mod  4) 


Proof : 

By  Euler's  criterion, 

(~)  s (-1)  (p“1)/2  (mod  p) 

Therefore,  since  (— ) - ±1  and  since  p > 2,  we  have  the 

P 

desired  result. 

Notice  that 

" (T^  (T}  " ('p1)  “ (-D (P_1)/2 


J 


i 
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a 2 2 

since  ( — ) « 1 by  (B.9),  therefore#  x = -a  (mod  p)  is 
P 

solvable  if  and  only  if  p = 1 (mod  4) . 

Example : 

Let 

x2  s 19  (mod  23) 


Now 


So 


19  = -4  (mod  23) 


,19> 

{J1] 


(IT) 


.-1.  ,2,2  . 

(2T}  (IT)  = _1 


Since  23  = 3 (mod  4).  Thus  x = 19  (mod  23)  is  not  solvable. 
How  does  one  go  about  computing  (^)  for  p j(  a? 

r 

Suppose  that 


al  at 

±Pl  ...  Pt 


where  p^#  pt  are  distinct  primes.  Since  p / a,  one 

sees  that  p ^ p^.  Then  by  Corollary  4#  one  has 


a ±1  Pi  *1  Pt  at 

<f>  - (-±>  1 ...  (f> 
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